Patchwork [2,of,2] verify: add an option to skip checking transformed revisions

login
register
mail settings
Submitter Jun Wu
Date May 11, 2017, 11:13 p.m.
Message ID <f63f6396ebda80cd0462.1494544411@x1c>
Download mbox | patch
Permalink /patch/20582/
State Accepted
Headers show

Comments

Jun Wu - May 11, 2017, 11:13 p.m.
# HG changeset patch
# User Jun Wu <quark@fb.com>
# Date 1494543792 25200
#      Thu May 11 16:03:12 2017 -0700
# Node ID f63f6396ebda80cd0462aaedc4dc0d95fa5ca7b0
# Parent  1008583138d3dca114bc9d4998a27e845fbc13b0
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r f63f6396ebda
verify: add an option to skip checking transformed revisions

Previously, "hg verify" verifies everything, which could be undesirable when
there are expensive flag processor contents.

This patch adds an "--skip-transform" flag to skip data transformation. That
means we don't go through potentially expensive "read" flag processor and
the check could be fast. Note: "renamed" is also skipped since it could
trigger a transformation by default.

In the LFS usecase, that means "hg verify --skip-transform" will not
download all LFS blobs, which could be too large to be stored locally.
Augie Fackler - May 13, 2017, 2:13 a.m.
On Thu, May 11, 2017 at 04:13:31PM -0700, Jun Wu wrote:
> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1494543792 25200
> #      Thu May 11 16:03:12 2017 -0700
> # Node ID f63f6396ebda80cd0462aaedc4dc0d95fa5ca7b0
> # Parent  1008583138d3dca114bc9d4998a27e845fbc13b0
> # Available At https://bitbucket.org/quark-zju/hg-draft
> #              hg pull https://bitbucket.org/quark-zju/hg-draft -r f63f6396ebda
> verify: add an option to skip checking transformed revisions
>
> Previously, "hg verify" verifies everything, which could be undesirable when
> there are expensive flag processor contents.
>
> This patch adds an "--skip-transform" flag to skip data transformation. That
> means we don't go through potentially expensive "read" flag processor and
> the check could be fast. Note: "renamed" is also skipped since it could
> trigger a transformation by default.
>
> In the LFS usecase, that means "hg verify --skip-transform" will not
> download all LFS blobs, which could be too large to be stored locally.

I'm not in love with the transform name bleeding out to the UX layer
here. Is there any way we can get this kind of functionality without
having to leak things this far?

I'm assuming that for LFS the issue is that we might incur cache
misses for all the things, which would be regrettable. Do we have an
idea of what other storage layers look like? Is it plausible that
someone would want to do full verification of one store but not
another?
Jun Wu - May 13, 2017, 5:41 p.m.
Excerpts from Augie Fackler's message of 2017-05-12 22:13:12 -0400:
> I'm not in love with the transform name bleeding out to the UX layer
> here. Is there any way we can get this kind of functionality without
> having to leak things this far?

How about keeping it as an internal config option but not command line
flags?

> I'm assuming that for LFS the issue is that we might incur cache
> misses for all the things, which would be regrettable. Do we have an
> idea of what other storage layers look like? Is it plausible that
> someone would want to do full verification of one store but not
> another?

Maybe make that option "skip flags" instead of a bool?
Augie Fackler - May 14, 2017, 1:29 p.m.
> On May 13, 2017, at 1:41 PM, Jun Wu <quark@fb.com> wrote:
> 
> Excerpts from Augie Fackler's message of 2017-05-12 22:13:12 -0400:
>> I'm not in love with the transform name bleeding out to the UX layer
>> here. Is there any way we can get this kind of functionality without
>> having to leak things this far?
> 
> How about keeping it as an internal config option but not command line
> flags?
> 
>> I'm assuming that for LFS the issue is that we might incur cache
>> misses for all the things, which would be regrettable. Do we have an
>> idea of what other storage layers look like? Is it plausible that
>> someone would want to do full verification of one store but not
>> another?
> 
> Maybe make that option "skip flags" instead of a bool?

A configlist sounds good to me.
Jun Wu - May 14, 2017, 4:09 p.m.
Excerpts from Augie Fackler's message of 2017-05-14 09:29:55 -0400:
> A configlist sounds good to me.

revlog does not know flag names registered by extensions. So the list cannot
have names but integers. Then I think a single configint is cleaner. If
that's okay, I'll go ahead writing patches.
Augie Fackler - May 14, 2017, 4:10 p.m.
On May 14, 2017 12:09 PM, "Jun Wu" <quark@fb.com> wrote:

Excerpts from Augie Fackler's message of 2017-05-14 09:29:55 -0400:
> A configlist sounds good to me.

revlog does not know flag names registered by extensions. So the list cannot
have names but integers. Then I think a single configint is cleaner. If
that's okay, I'll go ahead writing patches.


A touch unfortunate, but it'll do for now.  Mail away!

Patch

diff --git a/mercurial/commands.py b/mercurial/commands.py
--- a/mercurial/commands.py
+++ b/mercurial/commands.py
@@ -5444,6 +5444,8 @@  def update(ui, repo, node=None, rev=None
                                 updatecheck=updatecheck)
 
-@command('verify', [])
-def verify(ui, repo):
+@command('verify',
+         [('', 'skip-transform', None,
+           _('do not check transformed file revisions (ADVANCED)'))])
+def verify(ui, repo, **opts):
     """verify the integrity of the repository
 
@@ -5461,4 +5463,7 @@  def verify(ui, repo):
     Returns 0 on success, 1 if errors are encountered.
     """
+    if opts['skip_transform']:
+        # internal config: verify.skiptransform
+        repo.ui.setconfig('verify', 'skiptransform', True)
     return hg.verify(repo)
 
diff --git a/mercurial/verify.py b/mercurial/verify.py
--- a/mercurial/verify.py
+++ b/mercurial/verify.py
@@ -50,4 +50,5 @@  class verifier(object):
         self.refersmf = False
         self.fncachewarned = False
+        self.skiptransform = repo.ui.configbool('verify', 'skiptransform')
 
     def warn(self, msg):
@@ -428,6 +429,10 @@  class verifier(object):
                 #     use either "text" (external), or "rawtext" (in revlog).
                 try:
-                    fl.read(n) # side effect: read content and do checkhash
-                    rp = fl.renamed(n)
+                    skipread = self.skiptransform
+                    if skipread and fl.flags(i) == 0:
+                        skipread = False # no transform when flags is 0
+                    if not skipread:
+                        fl.read(n) # side effect: read content and do checkhash
+                        rp = fl.renamed(n)
                     # the "L1 == L2" check
                     l1 = fl.rawsize(i)
diff --git a/tests/test-completion.t b/tests/test-completion.t
--- a/tests/test-completion.t
+++ b/tests/test-completion.t
@@ -311,5 +311,5 @@  Show all commands + options
   tip: patch, git, style, template
   unbundle: update
-  verify: 
+  verify: skip-transform
   version: template
 
diff --git a/tests/test-help.t b/tests/test-help.t
--- a/tests/test-help.t
+++ b/tests/test-help.t
@@ -507,4 +507,6 @@  Test command without options
       Returns 0 on success, 1 if errors are encountered.
   
+  options:
+  
   (some details hidden, use --verbose to show complete help)
 
diff --git a/tests/test-verify.t b/tests/test-verify.t
--- a/tests/test-verify.t
+++ b/tests/test-verify.t
@@ -318,2 +318,46 @@  test revlog format 0
   1 files, 1 changesets, 1 total revisions
   $ cd ..
+
+test flag processor and --skip-transform
+
+  $ hg init skip-transform
+  $ cd skip-transform
+  $ cat >> .hg/hgrc <<EOF
+  > [extensions]
+  > flagprocesor=$RUNTESTDIR/flagprocessorext.py
+  > EOF
+  $ echo '[BASE64]content' > base64
+  $ hg commit -Aqm 'flag processor content' base64
+  $ hg verify
+  checking changesets
+  checking manifests
+  crosschecking files in changesets and manifests
+  checking files
+  1 files, 1 changesets, 1 total revisions
+
+  $ cat >> $TESTTMP/break-base64.py <<EOF
+  > from __future__ import absolute_import
+  > import base64
+  > base64.b64decode=lambda x: x
+  > EOF
+  $ cat >> .hg/hgrc <<EOF
+  > breakbase64=$TESTTMP/break-base64.py
+  > EOF
+
+  $ hg verify
+  checking changesets
+  checking manifests
+  crosschecking files in changesets and manifests
+  checking files
+   base64@0: unpacking 794cee7777cb: integrity check failed on data/base64.i:0
+  1 files, 1 changesets, 1 total revisions
+  1 integrity errors encountered!
+  (first damaged changeset appears to be 0)
+  [1]
+  $ hg verify --skip-transform
+  checking changesets
+  checking manifests
+  crosschecking files in changesets and manifests
+  checking files
+  1 files, 1 changesets, 1 total revisions
+