Patchwork [18,of,18,"] verify: also check full manifest validity during verify runs [RFC]

login
register
mail settings
Submitter Pierre-Yves David
Date March 6, 2019, 4:29 p.m.
Message ID <1e978412397a4fe629df.1551889774@nodosa.octopoid.net>
Download mbox | patch
Permalink /patch/39108/
State Accepted
Headers show

Comments

Pierre-Yves David - March 6, 2019, 4:29 p.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@octobus.net>
# Date 1551881213 -3600
#      Wed Mar 06 15:06:53 2019 +0100
# Node ID 1e978412397a4fe629dfa9420b1a7773dc936b19
# Parent  92091b3de1503bf562fe5bf2b544781f92702738
# EXP-Topic verify
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 1e978412397a
verify: also check full manifest validity during verify runs [RFC]

Before this changes, `hg verify` only checked if a manifest revision existed and
referenced the proper files. However it never checked the manifest revision
content itself.

Mercurial is expecting manifest entries to be sorted and will crash otherwise.
Since `hg verify` did not tried a full restoration of manifest entry, it could
ignore this kind of corruption.

This new check significantly increases the cost of a `hg verify` run. This
especially affects large repository not using `sparse-revlog`. We might want to
put it behind a new flag like `--full`. Another way to look at this would be to
offer a `--quick` flag that disable it. In particular, since `hg recover` runs
verify, this could impact users.
Pulkit Goyal - March 6, 2019, 6:31 p.m.
On Wed, Mar 6, 2019 at 8:10 PM Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@octobus.net>
> # Date 1551881213 -3600
> #      Wed Mar 06 15:06:53 2019 +0100
> # Node ID 1e978412397a4fe629dfa9420b1a7773dc936b19
> # Parent  92091b3de1503bf562fe5bf2b544781f92702738
> # EXP-Topic verify
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
> 1e978412397a
> verify: also check full manifest validity during verify runs [RFC]
>
> Before this changes, `hg verify` only checked if a manifest revision
> existed and
> referenced the proper files. However it never checked the manifest revision
> content itself.
>
> Mercurial is expecting manifest entries to be sorted and will crash
> otherwise.
> Since `hg verify` did not tried a full restoration of manifest entry, it
> could
> ignore this kind of corruption.
>
> This new check significantly increases the cost of a `hg verify` run. This
> especially affects large repository not using `sparse-revlog`. We might
> want to
> put it behind a new flag like `--full`. Another way to look at this would
> be to
> offer a `--quick` flag that disable it. In particular, since `hg recover`
> runs
> verify, this could impact users.
>

A slow `hg recover` is already painful for us whenever we run it. So we
decided whatever happen, we will never Ctrl+C a hg process.

That said, what are the cases which can lead to unsorted manifest
revisions? It will be worth putting them in commit message.
Pierre-Yves David - March 7, 2019, 3:51 p.m.
On 3/6/19 7:31 PM, Pulkit Goyal wrote:
> 
> 
> On Wed, Mar 6, 2019 at 8:10 PM Pierre-Yves David 
> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>> 
> wrote:
> 
>     # HG changeset patch
>     # User Pierre-Yves David <pierre-yves.david@octobus.net
>     <mailto:pierre-yves.david@octobus.net>>
>     # Date 1551881213 -3600
>     #      Wed Mar 06 15:06:53 2019 +0100
>     # Node ID 1e978412397a4fe629dfa9420b1a7773dc936b19
>     # Parent  92091b3de1503bf562fe5bf2b544781f92702738
>     # EXP-Topic verify
>     # Available At https://bitbucket.org/octobus/mercurial-devel/
>     #              hg pull
>     https://bitbucket.org/octobus/mercurial-devel/ -r 1e978412397a
>     verify: also check full manifest validity during verify runs [RFC]
> 
>     Before this changes, `hg verify` only checked if a manifest revision
>     existed and
>     referenced the proper files. However it never checked the manifest
>     revision
>     content itself.
> 
>     Mercurial is expecting manifest entries to be sorted and will crash
>     otherwise.
>     Since `hg verify` did not tried a full restoration of manifest
>     entry, it could
>     ignore this kind of corruption.
> 
>     This new check significantly increases the cost of a `hg verify`
>     run. This
>     especially affects large repository not using `sparse-revlog`. We
>     might want to
>     put it behind a new flag like `--full`. Another way to look at this
>     would be to
>     offer a `--quick` flag that disable it. In particular, since `hg
>     recover` runs
>     verify, this could impact users.
> 
> 
> A slow `hg recover` is already painful for us whenever we run it. So we 
> decided whatever happen, we will never Ctrl+C a hg process.

Yeah, I've experienced that too. Overall, the recover logic have been 
pretty solid in the case I end up using it. We should offer a way to run 
recover without the verify step.


> That said, what are the cases which can lead to unsorted manifest 
> revisions? It will be worth putting them in commit message.

There are a chicken and eggs issue here, I needed this command to learn 
more about the issue. And I now do know more:

There are 2 affected manifest used in 3 changesets for Mozilla-try 
(6b58490b1f9b, f13fe2770423, and 58842787f756).
All three changesets dates back from 2015 and are Authored by Mike 
Hommey. So I strongly suspect they were created by an early version of 
cinabar.

(I'm playing with Mozilla try as a test repository for our discovery works).

Even if such corruption will be very rare, I still think that running 
the extra check make sense by default. The following verify helps 
statement is currently wrong: "validating the hashes and checksums of 
each entry in the changelog, manifest, and tracked files". The manifest 
log hashes are not currently checked. This changeset fix this.

However this will make the recover situation worth. Here a wider plan to 
deal with that

1) add a --quick flag to `hg verify` that only check revlog length,
2) add a --verify flag to `hg recover` (--no-verify will skip the verify 
step),
3) make --no-verify the default,
3.5) maybe deprecated the --verify flag,
4) check manifest hashes (and order) by default.

What do you think ?
Pulkit Goyal - March 16, 2019, 10:23 p.m.
On Thu, Mar 7, 2019 at 9:21 PM Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 3/6/19 7:31 PM, Pulkit Goyal wrote:
> >
> >
> > On Wed, Mar 6, 2019 at 8:10 PM Pierre-Yves David
> > <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
>
> > wrote:
> >
> >     # HG changeset patch
> >     # User Pierre-Yves David <pierre-yves.david@octobus.net
> >     <mailto:pierre-yves.david@octobus.net>>
> >     # Date 1551881213 -3600
> >     #      Wed Mar 06 15:06:53 2019 +0100
> >     # Node ID 1e978412397a4fe629dfa9420b1a7773dc936b19
> >     # Parent  92091b3de1503bf562fe5bf2b544781f92702738
> >     # EXP-Topic verify
> >     # Available At https://bitbucket.org/octobus/mercurial-devel/
> >     #              hg pull
> >     https://bitbucket.org/octobus/mercurial-devel/ -r 1e978412397a
> >     verify: also check full manifest validity during verify runs [RFC]
> >
> >     Before this changes, `hg verify` only checked if a manifest revision
> >     existed and
> >     referenced the proper files. However it never checked the manifest
> >     revision
> >     content itself.
> >
> >     Mercurial is expecting manifest entries to be sorted and will crash
> >     otherwise.
> >     Since `hg verify` did not tried a full restoration of manifest
> >     entry, it could
> >     ignore this kind of corruption.
> >
> >     This new check significantly increases the cost of a `hg verify`
> >     run. This
> >     especially affects large repository not using `sparse-revlog`. We
> >     might want to
> >     put it behind a new flag like `--full`. Another way to look at this
> >     would be to
> >     offer a `--quick` flag that disable it. In particular, since `hg
> >     recover` runs
> >     verify, this could impact users.
> >
> >
> > A slow `hg recover` is already painful for us whenever we run it. So we
> > decided whatever happen, we will never Ctrl+C a hg process.
>
> Yeah, I've experienced that too. Overall, the recover logic have been
> pretty solid in the case I end up using it. We should offer a way to run
> recover without the verify step.
>
>
> > That said, what are the cases which can lead to unsorted manifest
> > revisions? It will be worth putting them in commit message.
>
> There are a chicken and eggs issue here, I needed this command to learn
> more about the issue. And I now do know more:
>
> There are 2 affected manifest used in 3 changesets for Mozilla-try
> (6b58490b1f9b, f13fe2770423, and 58842787f756).
> All three changesets dates back from 2015 and are Authored by Mike
> Hommey. So I strongly suspect they were created by an early version of
> cinabar.
>
> (I'm playing with Mozilla try as a test repository for our discovery
> works).
>
> Even if such corruption will be very rare, I still think that running
> the extra check make sense by default. The following verify helps
> statement is currently wrong: "validating the hashes and checksums of
> each entry in the changelog, manifest, and tracked files". The manifest
> log hashes are not currently checked. This changeset fix this.
>
> However this will make the recover situation worth. Here a wider plan to
> deal with that
>
> 1) add a --quick flag to `hg verify` that only check revlog length,
> 2) add a --verify flag to `hg recover` (--no-verify will skip the verify
> step),
> 3) make --no-verify the default,
> 3.5) maybe deprecated the --verify flag,
> 4) check manifest hashes (and order) by default.
>
> What do you think ?
>

I am bit confused. In 2) we add a --verify flag and in 3.5) we deprecate
that.

I like the idea of `hg recover` having a `--verify` flag. For verify
command, I am not sure which one is better from these:

1) have a `--quick` flag having behavior similar to current verify
2) have a `--full` flag which does full verification
3) have both 1) and 2), keep the default behavior as 1) and in future
change it 2)

I am +1 for checking manifest hashes and order by default provided we have
a way to skip that both in verify and recover.

>
> --
> Pierre-Yves David
>

Patch

diff --git a/mercurial/verify.py b/mercurial/verify.py
--- a/mercurial/verify.py
+++ b/mercurial/verify.py
@@ -330,6 +330,16 @@  class verifier(object):
                         filenodes.setdefault(fullpath, {}).setdefault(fn, lr)
             except Exception as inst:
                 self._exc(lr, _("reading delta %s") % short(n), inst, label)
+            if not dir:
+                try:
+                    # Manifest not in sorted order are invalid and will crash
+                    # Mercurial. We restore each entry to make sure they are
+                    # ordered.
+                    mfdelta = mfl.get(dir, n).read()
+                except Exception as inst:
+                    self._exc(lr, _("reading full manifest %s") % short(n),
+                              inst, label)
+
         if not dir:
             progress.complete()