Patchwork [4,of,4] verify: also check full manifest validity during verify runs

login
register
mail settings
Submitter Pierre-Yves David
Date April 16, 2019, 11:38 p.m.
Message ID <ed796867a06764cd78a5.1555457922@nodosa.octopoid.net>
Download mbox | patch
Permalink /patch/39677/
State Superseded
Headers show

Comments

Pierre-Yves David - April 16, 2019, 11:38 p.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@octobus.net>
# Date 1551881213 -3600
#      Wed Mar 06 15:06:53 2019 +0100
# Node ID ed796867a06764cd78a57b2ed0249353f5809319
# Parent  9bec7491e9b4cabdfa4d264e5213b1f416ec2607
# EXP-Topic verify
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r ed796867a067
verify: also check full manifest validity during verify runs

Before this changes, `hg verify` only checked if a manifest revision existed and
referenced the proper files. However it never checked the manifest revision
content itself.

Mercurial is expecting manifest entries to be sorted and will crash otherwise.
Since `hg verify` did not attempted a full restoration of manifest entry, it
could ignore this kind of corruption.

This new check significantly increases the cost of a `hg verify` run. This
especially affects large repository not using `sparse-revlog`. For now, this is
hidden behind the `--full` experimental flag.
via Mercurial-devel - April 17, 2019, 2:59 a.m.
On Tue, Apr 16, 2019, 16:46 Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@octobus.net>
> # Date 1551881213 -3600
> #      Wed Mar 06 15:06:53 2019 +0100
> # Node ID ed796867a06764cd78a57b2ed0249353f5809319
> # Parent  9bec7491e9b4cabdfa4d264e5213b1f416ec2607
> # EXP-Topic verify
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
> ed796867a067
> verify: also check full manifest validity during verify runs
>
> Before this changes, `hg verify` only checked if a manifest revision
> existed and
> referenced the proper files. However it never checked the manifest revision
> content itself.
>
> Mercurial is expecting manifest entries to be sorted and will crash
> otherwise.
> Since `hg verify` did not attempted a full restoration of manifest entry,
> it
> could ignore this kind of corruption.
>
> This new check significantly increases the cost of a `hg verify` run. This
> especially affects large repository not using `sparse-revlog`. For now,
> this is
> hidden behind the `--full` experimental flag.
>
> diff --git a/mercurial/verify.py b/mercurial/verify.py
> --- a/mercurial/verify.py
> +++ b/mercurial/verify.py
> @@ -337,6 +337,16 @@ class verifier(object):
>                          filenodes.setdefault(fullpath, {}).setdefault(fn,
> lr)
>              except Exception as inst:
>                  self._exc(lr, _("reading delta %s") % short(n), inst,
> label)
> +            if not dir and self._level >= VERIFY_FULL:
>

What does the "not dir" mean? I guess it's to do this check only for the
root directory when using tree manifests. Should we do it for all
directories?

+                try:
> +                    # Manifest not in sorted order are invalid and will
> crash
> +                    # Mercurial. We restore each entry to make sure they
> are
>

Nit: "restore" almost makes it sound like this is fixing broken entries.
Maybe something like "We read the full manifest at each revision to..."?

+                    # ordered.
> +                    mfdelta = mfl.get(dir, n).read()
> +                except Exception as inst:
> +                    self._exc(lr, _("reading full manifest %s") %
> short(n),
> +                              inst, label)
> +
>          if not dir:
>              progress.complete()
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
Pierre-Yves David - April 17, 2019, 10:34 a.m.
On 4/17/19 4:59 AM, Martin von Zweigbergk wrote:
> 
> 
> On Tue, Apr 16, 2019, 16:46 Pierre-Yves David 
> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>> 
> wrote:
> 
>     # HG changeset patch
>     # User Pierre-Yves David <pierre-yves.david@octobus.net
>     <mailto:pierre-yves.david@octobus.net>>
>     # Date 1551881213 -3600
>     #      Wed Mar 06 15:06:53 2019 +0100
>     # Node ID ed796867a06764cd78a57b2ed0249353f5809319
>     # Parent  9bec7491e9b4cabdfa4d264e5213b1f416ec2607
>     # EXP-Topic verify
>     # Available At https://bitbucket.org/octobus/mercurial-devel/
>     #              hg pull
>     https://bitbucket.org/octobus/mercurial-devel/ -r ed796867a067
>     verify: also check full manifest validity during verify runs
> 
>     Before this changes, `hg verify` only checked if a manifest revision
>     existed and
>     referenced the proper files. However it never checked the manifest
>     revision
>     content itself.
> 
>     Mercurial is expecting manifest entries to be sorted and will crash
>     otherwise.
>     Since `hg verify` did not attempted a full restoration of manifest
>     entry, it
>     could ignore this kind of corruption.
> 
>     This new check significantly increases the cost of a `hg verify`
>     run. This
>     especially affects large repository not using `sparse-revlog`. For
>     now, this is
>     hidden behind the `--full` experimental flag.
> 
>     diff --git a/mercurial/verify.py b/mercurial/verify.py
>     --- a/mercurial/verify.py
>     +++ b/mercurial/verify.py
>     @@ -337,6 +337,16 @@ class verifier(object):
>                               filenodes.setdefault(fullpath,
>     {}).setdefault(fn, lr)
>                   except Exception as inst:
>                       self._exc(lr, _("reading delta %s") % short(n),
>     inst, label)
>     +            if not dir and self._level >= VERIFY_FULL:
> 
> 
> What does the "not dir" mean? I guess it's to do this check only for the 
> root directory when using tree manifests. Should we do it for all 
> directories?

That is used earlier in the same function to denote "the root manifest". 
I think check the root manifest will trigger checks of the sub manifest 
but I am not sure, I am not too familiar tree manifest. Can we double 
check/fix this as a follow up ?

> 
>     +                try:
>     +                    # Manifest not in sorted order are invalid and
>     will crash
>     +                    # Mercurial. We restore each entry to make sure
>     they are
> 
> 
> Nit: "restore" almost makes it sound like this is fixing broken entries. 
> Maybe something like "We read the full manifest at each revision to..."?

Good point, V2, coming
via Mercurial-devel - April 17, 2019, 1:06 p.m.
On Wed, Apr 17, 2019, 03:34 Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 4/17/19 4:59 AM, Martin von Zweigbergk wrote:
> >
> >
> > On Tue, Apr 16, 2019, 16:46 Pierre-Yves David
> > <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
>
> > wrote:
> >
> >     # HG changeset patch
> >     # User Pierre-Yves David <pierre-yves.david@octobus.net
> >     <mailto:pierre-yves.david@octobus.net>>
> >     # Date 1551881213 -3600
> >     #      Wed Mar 06 15:06:53 2019 +0100
> >     # Node ID ed796867a06764cd78a57b2ed0249353f5809319
> >     # Parent  9bec7491e9b4cabdfa4d264e5213b1f416ec2607
> >     # EXP-Topic verify
> >     # Available At https://bitbucket.org/octobus/mercurial-devel/
> >     #              hg pull
> >     https://bitbucket.org/octobus/mercurial-devel/ -r ed796867a067
> >     verify: also check full manifest validity during verify runs
> >
> >     Before this changes, `hg verify` only checked if a manifest revision
> >     existed and
> >     referenced the proper files. However it never checked the manifest
> >     revision
> >     content itself.
> >
> >     Mercurial is expecting manifest entries to be sorted and will crash
> >     otherwise.
> >     Since `hg verify` did not attempted a full restoration of manifest
> >     entry, it
> >     could ignore this kind of corruption.
> >
> >     This new check significantly increases the cost of a `hg verify`
> >     run. This
> >     especially affects large repository not using `sparse-revlog`. For
> >     now, this is
> >     hidden behind the `--full` experimental flag.
> >
> >     diff --git a/mercurial/verify.py b/mercurial/verify.py
> >     --- a/mercurial/verify.py
> >     +++ b/mercurial/verify.py
> >     @@ -337,6 +337,16 @@ class verifier(object):
> >                               filenodes.setdefault(fullpath,
> >     {}).setdefault(fn, lr)
> >                   except Exception as inst:
> >                       self._exc(lr, _("reading delta %s") % short(n),
> >     inst, label)
> >     +            if not dir and self._level >= VERIFY_FULL:
> >
> >
> > What does the "not dir" mean? I guess it's to do this check only for the
> > root directory when using tree manifests. Should we do it for all
> > directories?
>
> That is used earlier in the same function to denote "the root manifest".
> I think check the root manifest will trigger checks of the sub manifest
> but I am not sure, I am not too familiar tree manifest.


I'm pretty sure it's about tree manifests. I asked just to make sure since
it surprised me that you used that as part of the condition here.

Can we double
> check/fix this as a follow up ?
>

It should be easy to check (just add a print statement and run tests, for
example), so I don't see much reason to fix such a simple thing in a
follow-up.

Btw, it would be nice to have tests too, but I understand that that's
harder to do. Thoughts on how it could be done? Prepared bundle or some
python code for writing out a bad manifest entry should work, I guess.


> >
> >     +                try:
> >     +                    # Manifest not in sorted order are invalid and
> >     will crash
> >     +                    # Mercurial. We restore each entry to make sure
> >     they are
> >
> >
> > Nit: "restore" almost makes it sound like this is fixing broken entries.
> > Maybe something like "We read the full manifest at each revision to..."?
>
> Good point, V2, coming
>
> --
> Pierre-Yves David
>
Pierre-Yves David - April 17, 2019, 1:47 p.m.
On 4/17/19 3:06 PM, Martin von Zweigbergk wrote:
> 
> 
> On Wed, Apr 17, 2019, 03:34 Pierre-Yves David 
> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>> 
> wrote:
> 
>     On 4/17/19 4:59 AM, Martin von Zweigbergk wrote:
>      >
>      >
>      > On Tue, Apr 16, 2019, 16:46 Pierre-Yves David
>      > <pierre-yves.david@ens-lyon.org
>     <mailto:pierre-yves.david@ens-lyon.org>
>     <mailto:pierre-yves.david@ens-lyon.org
>     <mailto:pierre-yves.david@ens-lyon.org>>>
>      > wrote:
>      >
>      >     # HG changeset patch
>      >     # User Pierre-Yves David <pierre-yves.david@octobus.net
>     <mailto:pierre-yves.david@octobus.net>
>      >     <mailto:pierre-yves.david@octobus.net
>     <mailto:pierre-yves.david@octobus.net>>>
>      >     # Date 1551881213 -3600
>      >     #      Wed Mar 06 15:06:53 2019 +0100
>      >     # Node ID ed796867a06764cd78a57b2ed0249353f5809319
>      >     # Parent  9bec7491e9b4cabdfa4d264e5213b1f416ec2607
>      >     # EXP-Topic verify
>      >     # Available At https://bitbucket.org/octobus/mercurial-devel/
>      >     #              hg pull
>      > https://bitbucket.org/octobus/mercurial-devel/ -r ed796867a067
>      >     verify: also check full manifest validity during verify runs
>      >
>      >     Before this changes, `hg verify` only checked if a manifest
>     revision
>      >     existed and
>      >     referenced the proper files. However it never checked the
>     manifest
>      >     revision
>      >     content itself.
>      >
>      >     Mercurial is expecting manifest entries to be sorted and will
>     crash
>      >     otherwise.
>      >     Since `hg verify` did not attempted a full restoration of
>     manifest
>      >     entry, it
>      >     could ignore this kind of corruption.
>      >
>      >     This new check significantly increases the cost of a `hg verify`
>      >     run. This
>      >     especially affects large repository not using
>     `sparse-revlog`. For
>      >     now, this is
>      >     hidden behind the `--full` experimental flag.
>      >
>      >     diff --git a/mercurial/verify.py b/mercurial/verify.py
>      >     --- a/mercurial/verify.py
>      >     +++ b/mercurial/verify.py
>      >     @@ -337,6 +337,16 @@ class verifier(object):
>      >                               filenodes.setdefault(fullpath,
>      >     {}).setdefault(fn, lr)
>      >                   except Exception as inst:
>      >                       self._exc(lr, _("reading delta %s") % short(n),
>      >     inst, label)
>      >     +            if not dir and self._level >= VERIFY_FULL:
>      >
>      >
>      > What does the "not dir" mean? I guess it's to do this check only
>     for the
>      > root directory when using tree manifests. Should we do it for all
>      > directories?
> 
>     That is used earlier in the same function to denote "the root
>     manifest".
>     I think check the root manifest will trigger checks of the sub manifest
>     but I am not sure, I am not too familiar tree manifest.
> 
> 
> I'm pretty sure it's about tree manifests. I asked just to make sure 
> since it surprised me that you used that as part of the condition here.

I know it is about tree manifest, I am reusing a pattern used by earlier 
code.

>     Can we double
>     check/fix this as a follow up ?
> 
> 
> It should be easy to check (just add a print statement and run tests, 
> for example), so I don't see much reason to fix such a simple thing in a 
> follow-up.

The question here is "Does reading the top level manifest will validated 
the content of the sub manifest too". I don't know how reliably to check 
it in a reasonable amount of time. I am not exposed to any tree manifest 
user myself.

> Btw, it would be nice to have tests too, but I understand that that's 
> harder to do. Thoughts on how it could be done? Prepared bundle or some 
> python code for writing out a bad manifest entry should work, I guess.

The corruption I encounter requires to write buggy manifest by hand. I 
agree if would be nice to cover that in the test. However, that requires 
significantly more time and exceed my initial "let's just share that 
code upstream" intend.

Cheers,
via Mercurial-devel - April 17, 2019, 1:49 p.m.
On Wed, Apr 17, 2019, 06:47 Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 4/17/19 3:06 PM, Martin von Zweigbergk wrote:
> >
> >
> > On Wed, Apr 17, 2019, 03:34 Pierre-Yves David
> > <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
>
> > wrote:
> >
> >     On 4/17/19 4:59 AM, Martin von Zweigbergk wrote:
> >      >
> >      >
> >      > On Tue, Apr 16, 2019, 16:46 Pierre-Yves David
> >      > <pierre-yves.david@ens-lyon.org
> >     <mailto:pierre-yves.david@ens-lyon.org>
> >     <mailto:pierre-yves.david@ens-lyon.org
> >     <mailto:pierre-yves.david@ens-lyon.org>>>
> >      > wrote:
> >      >
> >      >     # HG changeset patch
> >      >     # User Pierre-Yves David <pierre-yves.david@octobus.net
> >     <mailto:pierre-yves.david@octobus.net>
> >      >     <mailto:pierre-yves.david@octobus.net
> >     <mailto:pierre-yves.david@octobus.net>>>
> >      >     # Date 1551881213 -3600
> >      >     #      Wed Mar 06 15:06:53 2019 +0100
> >      >     # Node ID ed796867a06764cd78a57b2ed0249353f5809319
> >      >     # Parent  9bec7491e9b4cabdfa4d264e5213b1f416ec2607
> >      >     # EXP-Topic verify
> >      >     # Available At https://bitbucket.org/octobus/mercurial-devel/
> >      >     #              hg pull
> >      > https://bitbucket.org/octobus/mercurial-devel/ -r ed796867a067
> >      >     verify: also check full manifest validity during verify runs
> >      >
> >      >     Before this changes, `hg verify` only checked if a manifest
> >     revision
> >      >     existed and
> >      >     referenced the proper files. However it never checked the
> >     manifest
> >      >     revision
> >      >     content itself.
> >      >
> >      >     Mercurial is expecting manifest entries to be sorted and will
> >     crash
> >      >     otherwise.
> >      >     Since `hg verify` did not attempted a full restoration of
> >     manifest
> >      >     entry, it
> >      >     could ignore this kind of corruption.
> >      >
> >      >     This new check significantly increases the cost of a `hg
> verify`
> >      >     run. This
> >      >     especially affects large repository not using
> >     `sparse-revlog`. For
> >      >     now, this is
> >      >     hidden behind the `--full` experimental flag.
> >      >
> >      >     diff --git a/mercurial/verify.py b/mercurial/verify.py
> >      >     --- a/mercurial/verify.py
> >      >     +++ b/mercurial/verify.py
> >      >     @@ -337,6 +337,16 @@ class verifier(object):
> >      >                               filenodes.setdefault(fullpath,
> >      >     {}).setdefault(fn, lr)
> >      >                   except Exception as inst:
> >      >                       self._exc(lr, _("reading delta %s") %
> short(n),
> >      >     inst, label)
> >      >     +            if not dir and self._level >= VERIFY_FULL:
> >      >
> >      >
> >      > What does the "not dir" mean? I guess it's to do this check only
> >     for the
> >      > root directory when using tree manifests. Should we do it for all
> >      > directories?
> >
> >     That is used earlier in the same function to denote "the root
> >     manifest".
> >     I think check the root manifest will trigger checks of the sub
> manifest
> >     but I am not sure, I am not too familiar tree manifest.
> >
> >
> > I'm pretty sure it's about tree manifests. I asked just to make sure
> > since it surprised me that you used that as part of the condition here.
>
> I know it is about tree manifest, I am reusing a pattern used by earlier
> code.
>
> >     Can we double
> >     check/fix this as a follow up ?
> >
> >
> > It should be easy to check (just add a print statement and run tests,
> > for example), so I don't see much reason to fix such a simple thing in a
> > follow-up.
>
> The question here is "Does reading the top level manifest will validated
> the content of the sub manifest too". I don't know how reliably to check
> it in a reasonable amount of time. I am not exposed to any tree manifest
> user myself.
>

Ah, no, it won't read sub manifests. You're simply skipping sub manifests
by including the "not dir" check.


> > Btw, it would be nice to have tests too, but I understand that that's
> > harder to do. Thoughts on how it could be done? Prepared bundle or some
> > python code for writing out a bad manifest entry should work, I guess.
>
> The corruption I encounter requires to write buggy manifest by hand. I
> agree if would be nice to cover that in the test. However, that requires
> significantly more time and exceed my initial "let's just share that
> code upstream" intend.
>
> Cheers,
>
> --
> Pierre-Yves David
>
Pierre-Yves David - April 17, 2019, 1:51 p.m.
On 4/17/19 3:49 PM, Martin von Zweigbergk wrote:
> 
> 
> On Wed, Apr 17, 2019, 06:47 Pierre-Yves David 
> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>> 
> wrote:
> 
> 
> 
>     On 4/17/19 3:06 PM, Martin von Zweigbergk wrote:
>      >
>      >
>      > On Wed, Apr 17, 2019, 03:34 Pierre-Yves David
>      > <pierre-yves.david@ens-lyon.org
>     <mailto:pierre-yves.david@ens-lyon.org>
>     <mailto:pierre-yves.david@ens-lyon.org
>     <mailto:pierre-yves.david@ens-lyon.org>>>
>      > wrote:
>      >
>      >     On 4/17/19 4:59 AM, Martin von Zweigbergk wrote:
>      >      >
>      >      >
>      >      > On Tue, Apr 16, 2019, 16:46 Pierre-Yves David
>      >      > <pierre-yves.david@ens-lyon.org
>     <mailto:pierre-yves.david@ens-lyon.org>
>      >     <mailto:pierre-yves.david@ens-lyon.org
>     <mailto:pierre-yves.david@ens-lyon.org>>
>      >     <mailto:pierre-yves.david@ens-lyon.org
>     <mailto:pierre-yves.david@ens-lyon.org>
>      >     <mailto:pierre-yves.david@ens-lyon.org
>     <mailto:pierre-yves.david@ens-lyon.org>>>>
>      >      > wrote:
>      >      >
>      >      >     # HG changeset patch
>      >      >     # User Pierre-Yves David
>     <pierre-yves.david@octobus.net <mailto:pierre-yves.david@octobus.net>
>      >     <mailto:pierre-yves.david@octobus.net
>     <mailto:pierre-yves.david@octobus.net>>
>      >      >     <mailto:pierre-yves.david@octobus.net
>     <mailto:pierre-yves.david@octobus.net>
>      >     <mailto:pierre-yves.david@octobus.net
>     <mailto:pierre-yves.david@octobus.net>>>>
>      >      >     # Date 1551881213 -3600
>      >      >     #      Wed Mar 06 15:06:53 2019 +0100
>      >      >     # Node ID ed796867a06764cd78a57b2ed0249353f5809319
>      >      >     # Parent  9bec7491e9b4cabdfa4d264e5213b1f416ec2607
>      >      >     # EXP-Topic verify
>      >      >     # Available At
>     https://bitbucket.org/octobus/mercurial-devel/
>      >      >     #              hg pull
>      >      > https://bitbucket.org/octobus/mercurial-devel/ -r ed796867a067
>      >      >     verify: also check full manifest validity during
>     verify runs
>      >      >
>      >      >     Before this changes, `hg verify` only checked if a
>     manifest
>      >     revision
>      >      >     existed and
>      >      >     referenced the proper files. However it never checked the
>      >     manifest
>      >      >     revision
>      >      >     content itself.
>      >      >
>      >      >     Mercurial is expecting manifest entries to be sorted
>     and will
>      >     crash
>      >      >     otherwise.
>      >      >     Since `hg verify` did not attempted a full restoration of
>      >     manifest
>      >      >     entry, it
>      >      >     could ignore this kind of corruption.
>      >      >
>      >      >     This new check significantly increases the cost of a
>     `hg verify`
>      >      >     run. This
>      >      >     especially affects large repository not using
>      >     `sparse-revlog`. For
>      >      >     now, this is
>      >      >     hidden behind the `--full` experimental flag.
>      >      >
>      >      >     diff --git a/mercurial/verify.py b/mercurial/verify.py
>      >      >     --- a/mercurial/verify.py
>      >      >     +++ b/mercurial/verify.py
>      >      >     @@ -337,6 +337,16 @@ class verifier(object):
>      >      >                               filenodes.setdefault(fullpath,
>      >      >     {}).setdefault(fn, lr)
>      >      >                   except Exception as inst:
>      >      >                       self._exc(lr, _("reading delta %s")
>     % short(n),
>      >      >     inst, label)
>      >      >     +            if not dir and self._level >= VERIFY_FULL:
>      >      >
>      >      >
>      >      > What does the "not dir" mean? I guess it's to do this
>     check only
>      >     for the
>      >      > root directory when using tree manifests. Should we do it
>     for all
>      >      > directories?
>      >
>      >     That is used earlier in the same function to denote "the root
>      >     manifest".
>      >     I think check the root manifest will trigger checks of the
>     sub manifest
>      >     but I am not sure, I am not too familiar tree manifest.
>      >
>      >
>      > I'm pretty sure it's about tree manifests. I asked just to make sure
>      > since it surprised me that you used that as part of the condition
>     here.
> 
>     I know it is about tree manifest, I am reusing a pattern used by
>     earlier
>     code.
> 
>      >     Can we double
>      >     check/fix this as a follow up ?
>      >
>      >
>      > It should be easy to check (just add a print statement and run
>     tests,
>      > for example), so I don't see much reason to fix such a simple
>     thing in a
>      > follow-up.
> 
>     The question here is "Does reading the top level manifest will
>     validated
>     the content of the sub manifest too". I don't know how reliably to
>     check
>     it in a reasonable amount of time. I am not exposed to any tree
>     manifest
>     user myself.
> 
> 
> Ah, no, it won't read sub manifests. You're simply skipping sub 
> manifests by including the "not dir" check.

so should we just drop the "not dir" ?
via Mercurial-devel - April 17, 2019, 2:27 p.m.
On Wed, Apr 17, 2019, 06:52 Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 4/17/19 3:49 PM, Martin von Zweigbergk wrote:
> >
> >
> > On Wed, Apr 17, 2019, 06:47 Pierre-Yves David
> > <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
>
> > wrote:
> >
> >
> >
> >     On 4/17/19 3:06 PM, Martin von Zweigbergk wrote:
> >      >
> >      >
> >      > On Wed, Apr 17, 2019, 03:34 Pierre-Yves David
> >      > <pierre-yves.david@ens-lyon.org
> >     <mailto:pierre-yves.david@ens-lyon.org>
> >     <mailto:pierre-yves.david@ens-lyon.org
> >     <mailto:pierre-yves.david@ens-lyon.org>>>
> >      > wrote:
> >      >
> >      >     On 4/17/19 4:59 AM, Martin von Zweigbergk wrote:
> >      >      >
> >      >      >
> >      >      > On Tue, Apr 16, 2019, 16:46 Pierre-Yves David
> >      >      > <pierre-yves.david@ens-lyon.org
> >     <mailto:pierre-yves.david@ens-lyon.org>
> >      >     <mailto:pierre-yves.david@ens-lyon.org
> >     <mailto:pierre-yves.david@ens-lyon.org>>
> >      >     <mailto:pierre-yves.david@ens-lyon.org
> >     <mailto:pierre-yves.david@ens-lyon.org>
> >      >     <mailto:pierre-yves.david@ens-lyon.org
> >     <mailto:pierre-yves.david@ens-lyon.org>>>>
> >      >      > wrote:
> >      >      >
> >      >      >     # HG changeset patch
> >      >      >     # User Pierre-Yves David
> >     <pierre-yves.david@octobus.net <mailto:pierre-yves.david@octobus.net
> >
> >      >     <mailto:pierre-yves.david@octobus.net
> >     <mailto:pierre-yves.david@octobus.net>>
> >      >      >     <mailto:pierre-yves.david@octobus.net
> >     <mailto:pierre-yves.david@octobus.net>
> >      >     <mailto:pierre-yves.david@octobus.net
> >     <mailto:pierre-yves.david@octobus.net>>>>
> >      >      >     # Date 1551881213 -3600
> >      >      >     #      Wed Mar 06 15:06:53 2019 +0100
> >      >      >     # Node ID ed796867a06764cd78a57b2ed0249353f5809319
> >      >      >     # Parent  9bec7491e9b4cabdfa4d264e5213b1f416ec2607
> >      >      >     # EXP-Topic verify
> >      >      >     # Available At
> >     https://bitbucket.org/octobus/mercurial-devel/
> >      >      >     #              hg pull
> >      >      > https://bitbucket.org/octobus/mercurial-devel/ -r
> ed796867a067
> >      >      >     verify: also check full manifest validity during
> >     verify runs
> >      >      >
> >      >      >     Before this changes, `hg verify` only checked if a
> >     manifest
> >      >     revision
> >      >      >     existed and
> >      >      >     referenced the proper files. However it never checked
> the
> >      >     manifest
> >      >      >     revision
> >      >      >     content itself.
> >      >      >
> >      >      >     Mercurial is expecting manifest entries to be sorted
> >     and will
> >      >     crash
> >      >      >     otherwise.
> >      >      >     Since `hg verify` did not attempted a full restoration
> of
> >      >     manifest
> >      >      >     entry, it
> >      >      >     could ignore this kind of corruption.
> >      >      >
> >      >      >     This new check significantly increases the cost of a
> >     `hg verify`
> >      >      >     run. This
> >      >      >     especially affects large repository not using
> >      >     `sparse-revlog`. For
> >      >      >     now, this is
> >      >      >     hidden behind the `--full` experimental flag.
> >      >      >
> >      >      >     diff --git a/mercurial/verify.py b/mercurial/verify.py
> >      >      >     --- a/mercurial/verify.py
> >      >      >     +++ b/mercurial/verify.py
> >      >      >     @@ -337,6 +337,16 @@ class verifier(object):
> >      >      >
>  filenodes.setdefault(fullpath,
> >      >      >     {}).setdefault(fn, lr)
> >      >      >                   except Exception as inst:
> >      >      >                       self._exc(lr, _("reading delta %s")
> >     % short(n),
> >      >      >     inst, label)
> >      >      >     +            if not dir and self._level >= VERIFY_FULL:
> >      >      >
> >      >      >
> >      >      > What does the "not dir" mean? I guess it's to do this
> >     check only
> >      >     for the
> >      >      > root directory when using tree manifests. Should we do it
> >     for all
> >      >      > directories?
> >      >
> >      >     That is used earlier in the same function to denote "the root
> >      >     manifest".
> >      >     I think check the root manifest will trigger checks of the
> >     sub manifest
> >      >     but I am not sure, I am not too familiar tree manifest.
> >      >
> >      >
> >      > I'm pretty sure it's about tree manifests. I asked just to make
> sure
> >      > since it surprised me that you used that as part of the condition
> >     here.
> >
> >     I know it is about tree manifest, I am reusing a pattern used by
> >     earlier
> >     code.
> >
> >      >     Can we double
> >      >     check/fix this as a follow up ?
> >      >
> >      >
> >      > It should be easy to check (just add a print statement and run
> >     tests,
> >      > for example), so I don't see much reason to fix such a simple
> >     thing in a
> >      > follow-up.
> >
> >     The question here is "Does reading the top level manifest will
> >     validated
> >     the content of the sub manifest too". I don't know how reliably to
> >     check
> >     it in a reasonable amount of time. I am not exposed to any tree
> >     manifest
> >     user myself.
> >
> >
> > Ah, no, it won't read sub manifests. You're simply skipping sub
> > manifests by including the "not dir" check.
>
> so should we just drop the "not dir" ?
>

Yes, that's what I would have expected.


> --
> Pierre-Yves David
>

Patch

diff --git a/mercurial/verify.py b/mercurial/verify.py
--- a/mercurial/verify.py
+++ b/mercurial/verify.py
@@ -337,6 +337,16 @@  class verifier(object):
                         filenodes.setdefault(fullpath, {}).setdefault(fn, lr)
             except Exception as inst:
                 self._exc(lr, _("reading delta %s") % short(n), inst, label)
+            if not dir and self._level >= VERIFY_FULL:
+                try:
+                    # Manifest not in sorted order are invalid and will crash
+                    # Mercurial. We restore each entry to make sure they are
+                    # ordered.
+                    mfdelta = mfl.get(dir, n).read()
+                except Exception as inst:
+                    self._exc(lr, _("reading full manifest %s") % short(n),
+                              inst, label)
+
         if not dir:
             progress.complete()