Patchwork validate: check for spurious incoming filelog entries

login
register
mail settings
Submitter Sune Foldager
Date Dec. 11, 2012, 7:15 p.m.
Message ID <25a36cb85e7acda99015.1355253351@firefly.edlund.dk>
Download mbox | patch
Permalink /patch/64/
State Accepted
Commit f009804e2a439642a918fdca7feff0267845fc9b
Headers show

Comments

Sune Foldager - Dec. 11, 2012, 7:15 p.m.
# HG changeset patch
# User Sune Foldager <cryo at cyanite.org>
# Date 1355253201 -3600
# Node ID 25a36cb85e7acda9901505832a186dfb883c451f
# Parent  5522a7951bd7e2b16831ba1736feb2e9145d7e58
validate: check for spurious incoming filelog entries

Accepting those will lead to "mild corruption", correctly reported as
an error by hg verify, but often not a problem in practice.

Enabled when server.validate is switched on.
Kevin Bullock - Dec. 11, 2012, 8:22 p.m.
On 11 Dec 2012, at 1:15 PM, Sune Foldager wrote:

> # HG changeset patch
> # User Sune Foldager <cryo at cyanite.org>
> # Date 1355253201 -3600
> # Node ID 25a36cb85e7acda9901505832a186dfb883c451f
> # Parent  5522a7951bd7e2b16831ba1736feb2e9145d7e58
> validate: check for spurious incoming filelog entries
> 
> Accepting those will lead to "mild corruption", correctly reported as
> an error by hg verify, but often not a problem in practice.
> 
> Enabled when server.validate is switched on.

Do you have a case where it _is_ a problem in practice?

I'm generally +1 but want to be clearer on the possible impact. It won't really affect anyone who's running `hg verify` on their repos regularly, but that's likely to be the minority of our users.

pacem in terris / ??? / ?????? / ????????? / ??
Kevin R. Bullock
Sune Foldager - Dec. 11, 2012, 8:33 p.m.
On 2012-12-11 14:22, Kevin Bullock wrote:
>On 11 Dec 2012, at 1:15 PM, Sune Foldager wrote:
>
>> # HG changeset patch
>> # User Sune Foldager <cryo at cyanite.org>
>> # Date 1355253201 -3600
>> # Node ID 25a36cb85e7acda9901505832a186dfb883c451f
>> # Parent  5522a7951bd7e2b16831ba1736feb2e9145d7e58
>> validate: check for spurious incoming filelog entries
>>
>> Accepting those will lead to "mild corruption", correctly reported as
>> an error by hg verify, but often not a problem in practice.
>>
>> Enabled when server.validate is switched on.
>
>Do you have a case where it _is_ a problem in practice?

Well, server.validate is supposed to protect against corruption, and this is corruption. It is, for instance, a problem when you try to access the filelog (using hg log <filename>) for the file with spurious entries, as the linkrevs will be completely arbitrary (and always wrong).  Basically everything that looks at the filelog is affected.

>I'm generally +1 but want to be clearer on the possible impact. It won't really affect anyone who's running `hg verify` on their repos regularly, but that's likely to be the minority of our users.

No client should ever produce these changesets, but a malicious one might. Or an error in a plugin or similar.

For instance, due to a plugin error, we had a situation at work where a changeset with this problem was pushed to our server, and now everyone has it (and it's 100's of changesets from the tip). We have some tools that inspect the various graphs, talking directly to the mercurial python modules. These fail for the changesets involved, since the invariants that normally hold, don't in this case.

-Sune
Kevin Bullock - Dec. 11, 2012, 8:52 p.m.
On 11 Dec 2012, at 2:33 PM, Sune Foldager wrote:

> On 2012-12-11 14:22, Kevin Bullock wrote:
>> On 11 Dec 2012, at 1:15 PM, Sune Foldager wrote:
>> 
>>> # HG changeset patch
>>> # User Sune Foldager <cryo at cyanite.org>
>>> # Date 1355253201 -3600
>>> # Node ID 25a36cb85e7acda9901505832a186dfb883c451f
>>> # Parent  5522a7951bd7e2b16831ba1736feb2e9145d7e58
>>> validate: check for spurious incoming filelog entries
>>> 
>>> Accepting those will lead to "mild corruption", correctly reported as
>>> an error by hg verify, but often not a problem in practice.
>>> 
>>> Enabled when server.validate is switched on.
>> 
>> Do you have a case where it _is_ a problem in practice?
> 
> Well, server.validate is supposed to protect against corruption, and this is corruption. It is, for instance, a problem when you try to access the filelog (using hg log <filename>) for the file with spurious entries, as the linkrevs will be completely arbitrary (and always wrong).  Basically everything that looks at the filelog is affected.
> 
>> I'm generally +1 but want to be clearer on the possible impact. It won't really affect anyone who's running `hg verify` on their repos regularly, but that's likely to be the minority of our users.
> 
> No client should ever produce these changesets, but a malicious one might. Or an error in a plugin or similar.
> 
> For instance, due to a plugin error, we had a situation at work where a changeset with this problem was pushed to our server, and now everyone has it (and it's 100's of changesets from the tip). We have some tools that inspect the various graphs, talking directly to the mercurial python modules. These fail for the changesets involved, since the invariants that normally hold, don't in this case.

Seems like a good idea to reject such pushes then.

> diff -r 5522a7951bd7 -r 25a36cb85e7a tests/test-push-validation.t
> --- a/tests/test-push-validation.t	Mon Dec 10 12:14:55 2012 -0800
> +++ b/tests/test-push-validation.t	Tue Dec 11 20:13:21 2012 +0100
> @@ -18,7 +18,48 @@
>   updating to branch default
>   2 files updated, 0 files merged, 0 files removed, 0 files unresolved
> 
> +Test spurious filelog entries:
> +
>   $ cd test-clone
> +  $ echo blah >> beta
> +  $ cp .hg/store/data/beta.i tmp1

I had to squint to tell that this line is actually saving the filelog for the sake of the *next* (already existing) test. A comment here would be useful.

Aside from that, LGTM.

pacem in terris / ??? / ?????? / ????????? / ??
Kevin R. Bullock
Sune Foldager - Dec. 12, 2012, 11:49 a.m.
On 2012-12-11 14:52, Kevin Bullock wrote:
>On 11 Dec 2012, at 2:33 PM, Sune Foldager wrote:
>> +++ b/tests/test-push-validation.t	Tue Dec 11 20:13:21 2012 +0100
>> @@ -18,7 +18,48 @@
>>   updating to branch default
>>   2 files updated, 0 files merged, 0 files removed, 0 files unresolved
>>
>> +Test spurious filelog entries:
>> +
>>   $ cd test-clone
>> +  $ echo blah >> beta
>> +  $ cp .hg/store/data/beta.i tmp1
>
>I had to squint to tell that this line is actually saving the filelog for the sake of the *next* (already existing) test. A comment here would be useful.
>
>Aside from that, LGTM.

Lesbian Gay Trans... what?  Anyway, Mads is +1 on this as well, I know, so I think I'll add a comment to the test, and push it to crew a bit later.

-Sune
Kevin Bullock - Dec. 12, 2012, 5:11 p.m.
On Dec 12, 2012, at 5:49 AM, Sune Foldager wrote:

> On 2012-12-11 14:52, Kevin Bullock wrote:
>> On 11 Dec 2012, at 2:33 PM, Sune Foldager wrote:
>>> +++ b/tests/test-push-validation.t	Tue Dec 11 20:13:21 2012 +0100
>>> @@ -18,7 +18,48 @@
>>>  updating to branch default
>>>  2 files updated, 0 files merged, 0 files removed, 0 files unresolved
>>> 
>>> +Test spurious filelog entries:
>>> +
>>>  $ cd test-clone
>>> +  $ echo blah >> beta
>>> +  $ cp .hg/store/data/beta.i tmp1
>> 
>> I had to squint to tell that this line is actually saving the filelog for the sake of the *next* (already existing) test. A comment here would be useful.
>> 
>> Aside from that, LGTM.
> 
> Lesbian Gay Trans... what?  Anyway, Mads is +1 on this as well, I know, so I think I'll add a comment to the test, and push it to crew a bit later.

Looks Good To Me. Blame Augie for infecting me with that particular acronym. Push away.

pacem in terris / ??? / ?????? / ????????? / ??
Kevin R. Bullock
Augie Fackler - Dec. 12, 2012, 8:01 p.m.
On Dec 12, 2012, at 11:11 AM, Kevin Bullock <kbullock+mercurial at ringworld.org> wrote:

> On Dec 12, 2012, at 5:49 AM, Sune Foldager wrote:
> 
>> On 2012-12-11 14:52, Kevin Bullock wrote:
>>> On 11 Dec 2012, at 2:33 PM, Sune Foldager wrote:
>>>> +++ b/tests/test-push-validation.t	Tue Dec 11 20:13:21 2012 +0100
>>>> @@ -18,7 +18,48 @@
>>>> updating to branch default
>>>> 2 files updated, 0 files merged, 0 files removed, 0 files unresolved
>>>> 
>>>> +Test spurious filelog entries:
>>>> +
>>>> $ cd test-clone
>>>> +  $ echo blah >> beta
>>>> +  $ cp .hg/store/data/beta.i tmp1
>>> 
>>> I had to squint to tell that this line is actually saving the filelog for the sake of the *next* (already existing) test. A comment here would be useful.
>>> 
>>> Aside from that, LGTM.
>> 
>> Lesbian Gay Trans... what?  Anyway, Mads is +1 on this as well, I know, so I think I'll add a comment to the test, and push it to crew a bit later.
> 
> Looks Good To Me. Blame Augie for infecting me with that particular acronym. Push away.

LGTM here too, although code.google.com will likely never check this since it's a harmless situation we'd just ignore (the file node wouldn't be reachable, and it'd eventually be garbage collectable.)

> 
> pacem in terris / ??? / ?????? / ????????? / ??
> Kevin R. Bullock
> 
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
Matt Mackall - Jan. 17, 2013, 6:30 p.m.
On Tue, 2012-12-11 at 20:15 +0100, Sune Foldager wrote:
> # HG changeset patch
> # User Sune Foldager <cryo@cyanite.org>
> # Date 1355253201 -3600
> # Node ID 25a36cb85e7acda9901505832a186dfb883c451f
> # Parent  5522a7951bd7e2b16831ba1736feb2e9145d7e58
> validate: check for spurious incoming filelog entries

Looks like this was acked as "please push" ages ago. Queue for default,
thanks.

Patch

diff -r 5522a7951bd7 -r 25a36cb85e7a mercurial/localrepo.py
--- a/mercurial/localrepo.py	Mon Dec 10 12:14:55 2012 -0800
+++ b/mercurial/localrepo.py	Tue Dec 11 20:13:21 2012 +0100
@@ -2430,6 +2430,9 @@ 
                         n = fl.node(new)
                         if n in needs:
                             needs.remove(n)
+                        else:
+                            raise util.Abort(
+                                _("received spurious file revlog entry"))
                     if not needs:
                         del needfiles[f]
             self.ui.progress(_('files'), None)
diff -r 5522a7951bd7 -r 25a36cb85e7a tests/test-push-validation.t
--- a/tests/test-push-validation.t	Mon Dec 10 12:14:55 2012 -0800
+++ b/tests/test-push-validation.t	Tue Dec 11 20:13:21 2012 +0100
@@ -18,7 +18,48 @@ 
   updating to branch default
   2 files updated, 0 files merged, 0 files removed, 0 files unresolved
 
+Test spurious filelog entries:
+
   $ cd test-clone
+  $ echo blah >> beta
+  $ cp .hg/store/data/beta.i tmp1
+  $ hg ci -m 2
+  $ cp .hg/store/data/beta.i tmp2
+  $ hg -q rollback
+  $ mv tmp2 .hg/store/data/beta.i
+  $ echo blah >> beta
+  $ hg ci -m '2 (corrupt)'
+
+Expected to fail:
+
+  $ hg verify
+  checking changesets
+  checking manifests
+  crosschecking files in changesets and manifests
+  checking files
+   beta at 1: dddc47b3ba30 not in manifests
+  2 files, 2 changesets, 4 total revisions
+  1 integrity errors encountered!
+  (first damaged changeset appears to be 1)
+  [1]
+
+  $ hg push
+  pushing to $TESTTMP/test
+  searching for changes
+  adding changesets
+  adding manifests
+  adding file changes
+  transaction abort!
+  rollback completed
+  abort: received spurious file revlog entry
+  [255]
+
+  $ hg -q rollback
+  $ mv tmp1 .hg/store/data/beta.i
+  $ echo beta > beta
+
+Test missing filelog entries:
+
   $ cp .hg/store/data/beta.i tmp
   $ echo blah >> beta
   $ hg ci -m '2 (corrupt)'
@@ -37,8 +78,6 @@ 
   (first damaged changeset appears to be 1)
   [1]
 
-Expected to fail:
-
   $ hg push
   pushing to $TESTTMP/test (glob)
   searching for changes