Patchwork [STABLE] verify: don't init a new subrepo when a missing one is referenced (issue5128)

login
register
mail settings
Submitter Matt Harbison
Date April 12, 2016, 3:20 a.m.
Message ID <op.yfsd84pb9lwrgf@envy>
Download mbox | patch
Permalink /patch/14537/
State Changes Requested
Headers show

Comments

Matt Harbison - April 12, 2016, 3:20 a.m.
On Mon, 04 Apr 2016 03:50:49 -0400, Pierre-Yves David  
<pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 04/03/2016 01:15 PM, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison <matt_harbison@yahoo.com>
>> # Date 1459708994 14400
>> #      Sun Apr 03 14:43:14 2016 -0400
>> # Branch stable
>> # Node ID 63ce31ee0e582bc25327b27c0584445b28a90686
>> # Parent  2d39f987f0bacc39a8319cb6c84b2d38c1991028
>> verify: don't init a new subrepo when a missing one is referenced  
>> (issue5128)
>>
>> Initializing a subrepo when one doesn't exist is the right thing to do  
>> when the
>> parent is being updated, but in few other cases.  Unfortunately, there  
>> isn't
>> enough context in the subrepo module to distinguish this case.  This  
>> same issue
>> can be caused with other subrepo aware commands, so there is a general  
>> issue
>> here beyond the scope of this fix.
>>
>> A simpler attempt I tried was to add an '_updating' boolean to  
>> localrepo, and
>> set/clear it around the call to mergemod.update() in hg.updaterepo().   
>> That
>> mostly worked, but doesn't handle the case where archive will clone the  
>> subrepo
>> if it is missing.  (I vaguely recall that there may be other commands  
>> that will
>> clone if needed like this, but certainly not all do.  It seems both  
>> handy, and a
>> bit surprising for what should be a read only operation.  It might be  
>> nice if
>> all commands did this consistently, but we probably need Angel's  
>> subrepo caching
>> first, to not make a mess of the working directory.)
>>
>> It was suggested in the bug discussion to skip looking at the subrepo  
>> links
>> unless -S is specified.  I don't really like that idea because missing  
>> a subrepo
>> or (less likely, but worse) a corrupt .hgsubstate is a problem of the  
>> parent
>> repo when checking out a revision.  The -S option seems like a better  
>> fit for
>> functionality that would recurse into each subrepo and do a full  
>> verification.
>>
>> Ultimately, the default value for 'allowcreate' should probably be  
>> flipped, but
>> since the default behavior was to allow creation, this is less risky  
>> for now.
>
> I think we need at least 2 extra pieces here:
>
> First, I don't think we should silently skip the subrepo, we want at  
> least a warning to tell the user we did not verify one of them.
>
> Second, we probably want a way to disable this and still check all  
> subrepository (at the expense of creating the directory, for now).
>
> (note: I think I prefered the -S approach, but I'm fine with this one).

Any objection to something like this?  (I'll fold and resubmit of course):


This way, the rather uninformative "corrupt" message is limited to  
failures parsing .hgsubstate, but failures in one subrepo don't block  
checking others in the same rev.  Also, no need for a flag to force the  
(bad) repo creation behavior.

My concern would be can we rely on exceptions always having a useful  
message?  I suppose we could handle Abort separate from Exception, but I'm  
not sure what the text should be.

I was going to try to fixup the "corrupt" message on default, but this  
seems to take care of that in a way that is suitable for stable.
Pierre-Yves David - April 13, 2016, 6:09 a.m.
On 04/11/2016 08:20 PM, Matt Harbison wrote:
> On Mon, 04 Apr 2016 03:50:49 -0400, Pierre-Yves David 
> <pierre-yves.david@ens-lyon.org> wrote:
>> On 04/03/2016 01:15 PM, Matt Harbison wrote:
>>> # HG changeset patch
>>> # User Matt Harbison <matt_harbison@yahoo.com>
>>> # Date 1459708994 14400
>>> #      Sun Apr 03 14:43:14 2016 -0400
>>> # Branch stable
>>> # Node ID 63ce31ee0e582bc25327b27c0584445b28a90686
>>> # Parent  2d39f987f0bacc39a8319cb6c84b2d38c1991028
>>> verify: don't init a new subrepo when a missing one is referenced 
>>> (issue5128)
>>>
>>> Initializing a subrepo when one doesn't exist is the right thing to 
>>> do when the
>>> parent is being updated, but in few other cases. Unfortunately, 
>>> there isn't
>>> enough context in the subrepo module to distinguish this case.  This 
>>> same issue
>>> can be caused with other subrepo aware commands, so there is a 
>>> general issue
>>> here beyond the scope of this fix.
>>>
>>> A simpler attempt I tried was to add an '_updating' boolean to 
>>> localrepo, and
>>> set/clear it around the call to mergemod.update() in 
>>> hg.updaterepo().  That
>>> mostly worked, but doesn't handle the case where archive will clone 
>>> the subrepo
>>> if it is missing.  (I vaguely recall that there may be other 
>>> commands that will
>>> clone if needed like this, but certainly not all do.  It seems both 
>>> handy, and a
>>> bit surprising for what should be a read only operation.  It might 
>>> be nice if
>>> all commands did this consistently, but we probably need Angel's 
>>> subrepo caching
>>> first, to not make a mess of the working directory.)
>>>
>>> It was suggested in the bug discussion to skip looking at the 
>>> subrepo links
>>> unless -S is specified.  I don't really like that idea because 
>>> missing a subrepo
>>> or (less likely, but worse) a corrupt .hgsubstate is a problem of 
>>> the parent
>>> repo when checking out a revision.  The -S option seems like a 
>>> better fit for
>>> functionality that would recurse into each subrepo and do a full 
>>> verification.
>>>
>>> Ultimately, the default value for 'allowcreate' should probably be 
>>> flipped, but
>>> since the default behavior was to allow creation, this is less risky 
>>> for now.
>>
>> I think we need at least 2 extra pieces here:
>>
>> First, I don't think we should silently skip the subrepo, we want at 
>> least a warning to tell the user we did not verify one of them.
>>
>> Second, we probably want a way to disable this and still check all 
>> subrepository (at the expense of creating the directory, for now).
>>
>> (note: I think I prefered the -S approach, but I'm fine with this one).
>
> Any objection to something like this?  (I'll fold and resubmit of 
> course):
>
> diff --git a/mercurial/hg.py b/mercurial/hg.py
> --- a/mercurial/hg.py
> +++ b/mercurial/hg.py
> @@ -828,7 +828,10 @@
>              ctx = repo[rev]
>              try:
>                  for subpath in ctx.substate:
> -                    ret = ctx.sub(subpath, 
> allowcreate=False).verify() or ret
> +                    try:
> +                        ret = ctx.sub(subpath, 
> allowcreate=False).verify() or ret
> +                    except Exception as e:
> +                        repo.ui.warn(_('%s: %s\n') % (rev, e))

That is super wide, any way we could narrow this?

On the other hand, this mean we would really be prepared for all. 
However, we probably want to let some Mercurial specific (Abort, etc) 
through.

>              except Exception:
>                  repo.ui.warn(_('.hgsubstate is corrupt in revision 
> %s\n') %
>                               node.short(ctx.node()))
> diff --git a/tests/test-subrepo-missing.t b/tests/test-subrepo-missing.t
> --- a/tests/test-subrepo-missing.t
> +++ b/tests/test-subrepo-missing.t
> @@ -131,10 +131,10 @@
>    checking files
>    2 files, 5 changesets, 5 total revisions
>    checking subrepo links
> -  .hgsubstate is corrupt in revision ef278ff32036
> -  .hgsubstate is corrupt in revision a66de08943b6
> -  .hgsubstate is corrupt in revision 674d05939c1e
> -  .hgsubstate is corrupt in revision a7d05d9055a4
> +  0: repository $TESTTMP\repo\subrepo not found
> +  1: repository $TESTTMP\repo\subrepo not found
> +  3: repository $TESTTMP\repo\subrepo not found
> +  4: repository $TESTTMP\repo\subrepo not found

That's an improvement, so we can probably take that until someone have a 
clear idea of a potential better message.

(beware, your path use \ )

>    $ find subrepo
>    find: `subrepo': * (glob)
>    [1]
>
> This way, the rather uninformative "corrupt" message is limited to 
> failures parsing .hgsubstate, but failures in one subrepo don't block 
> checking others in the same rev.  Also, no need for a flag to force 
> the (bad) repo creation behavior.
>
> My concern would be can we rely on exceptions always having a useful 
> message?  I suppose we could handle Abort separate from Exception, but 
> I'm not sure what the text should be.
>
> I was going to try to fixup the "corrupt" message on default, but this 
> seems to take care of that in a way that is suitable for stable.
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Matt Harbison - April 14, 2016, 1:01 a.m.
On Wed, 13 Apr 2016 02:09:08 -0400, Pierre-Yves David  
<pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 04/11/2016 08:20 PM, Matt Harbison wrote:
>> On Mon, 04 Apr 2016 03:50:49 -0400, Pierre-Yves David  
>> <pierre-yves.david@ens-lyon.org> wrote:
>>> On 04/03/2016 01:15 PM, Matt Harbison wrote:
>>>> # HG changeset patch
>>>> # User Matt Harbison <matt_harbison@yahoo.com>
>>>> # Date 1459708994 14400
>>>> #      Sun Apr 03 14:43:14 2016 -0400
>>>> # Branch stable
>>>> # Node ID 63ce31ee0e582bc25327b27c0584445b28a90686
>>>> # Parent  2d39f987f0bacc39a8319cb6c84b2d38c1991028
>>>> verify: don't init a new subrepo when a missing one is referenced  
>>>> (issue5128)
>>>>
>>>> Initializing a subrepo when one doesn't exist is the right thing to  
>>>> do when the
>>>> parent is being updated, but in few other cases. Unfortunately, there  
>>>> isn't
>>>> enough context in the subrepo module to distinguish this case.  This  
>>>> same issue
>>>> can be caused with other subrepo aware commands, so there is a  
>>>> general issue
>>>> here beyond the scope of this fix.
>>>>
>>>> A simpler attempt I tried was to add an '_updating' boolean to  
>>>> localrepo, and
>>>> set/clear it around the call to mergemod.update() in  
>>>> hg.updaterepo().  That
>>>> mostly worked, but doesn't handle the case where archive will clone  
>>>> the subrepo
>>>> if it is missing.  (I vaguely recall that there may be other commands  
>>>> that will
>>>> clone if needed like this, but certainly not all do.  It seems both  
>>>> handy, and a
>>>> bit surprising for what should be a read only operation.  It might be  
>>>> nice if
>>>> all commands did this consistently, but we probably need Angel's  
>>>> subrepo caching
>>>> first, to not make a mess of the working directory.)
>>>>
>>>> It was suggested in the bug discussion to skip looking at the subrepo  
>>>> links
>>>> unless -S is specified.  I don't really like that idea because  
>>>> missing a subrepo
>>>> or (less likely, but worse) a corrupt .hgsubstate is a problem of the  
>>>> parent
>>>> repo when checking out a revision.  The -S option seems like a better  
>>>> fit for
>>>> functionality that would recurse into each subrepo and do a full  
>>>> verification.
>>>>
>>>> Ultimately, the default value for 'allowcreate' should probably be  
>>>> flipped, but
>>>> since the default behavior was to allow creation, this is less risky  
>>>> for now.
>>>
>>> I think we need at least 2 extra pieces here:
>>>
>>> First, I don't think we should silently skip the subrepo, we want at  
>>> least a warning to tell the user we did not verify one of them.
>>>
>>> Second, we probably want a way to disable this and still check all  
>>> subrepository (at the expense of creating the directory, for now).
>>>
>>> (note: I think I prefered the -S approach, but I'm fine with this one).
>>
>> Any objection to something like this?  (I'll fold and resubmit of  
>> course):
>>
>> diff --git a/mercurial/hg.py b/mercurial/hg.py
>> --- a/mercurial/hg.py
>> +++ b/mercurial/hg.py
>> @@ -828,7 +828,10 @@
>>              ctx = repo[rev]
>>              try:
>>                  for subpath in ctx.substate:
>> -                    ret = ctx.sub(subpath, allowcreate=False).verify()  
>> or ret
>> +                    try:
>> +                        ret = ctx.sub(subpath,  
>> allowcreate=False).verify() or ret
>> +                    except Exception as e:
>> +                        repo.ui.warn(_('%s: %s\n') % (rev, e))
>
> That is super wide, any way we could narrow this?
>
> On the other hand, this mean we would really be prepared for all.  
> However, we probably want to let some Mercurial specific (Abort, etc)  
> through.

That's what I was thinking.  It looks like the specific error raised for  
the missing + no create case here is RepoError.

The issue with handling Abort here specially by just rethrowing it (or any  
other narrowing), is that it will immediately be caught by the existing  
handler wrapping this loop.  Since this is nested in a revision loop, it  
will simply carry on to the next revision instead of bailing.  And the  
existing handler does need to be that wide there, because subrepo.state()  
raises several Aborts.  I guess we could throw a subclass of Abort from  
subrepo.state().  Handle that and Exception by logging and continuing, but  
other Aborts by bailing.  But that seems hacky, and would probably raise  
eyebrows on stable?

>>              except Exception:
>>                  repo.ui.warn(_('.hgsubstate is corrupt in revision  
>> %s\n') %
>>                               node.short(ctx.node()))
>> diff --git a/tests/test-subrepo-missing.t b/tests/test-subrepo-missing.t
>> --- a/tests/test-subrepo-missing.t
>> +++ b/tests/test-subrepo-missing.t
>> @@ -131,10 +131,10 @@
>>    checking files
>>    2 files, 5 changesets, 5 total revisions
>>    checking subrepo links
>> -  .hgsubstate is corrupt in revision ef278ff32036
>> -  .hgsubstate is corrupt in revision a66de08943b6
>> -  .hgsubstate is corrupt in revision 674d05939c1e
>> -  .hgsubstate is corrupt in revision a7d05d9055a4
>> +  0: repository $TESTTMP\repo\subrepo not found
>> +  1: repository $TESTTMP\repo\subrepo not found
>> +  3: repository $TESTTMP\repo\subrepo not found
>> +  4: repository $TESTTMP\repo\subrepo not found
>
> That's an improvement, so we can probably take that until someone have a  
> clear idea of a potential better message.
>
> (beware, your path use \ )
>
>>    $ find subrepo
>>    find: `subrepo': * (glob)
>>    [1]
>>
>> This way, the rather uninformative "corrupt" message is limited to  
>> failures parsing .hgsubstate, but failures in one subrepo don't block  
>> checking others in the same rev.  Also, no need for a flag to force the  
>> (bad) repo creation behavior.
>>
>> My concern would be can we rely on exceptions always having a useful  
>> message?  I suppose we could handle Abort separate from Exception, but  
>> I'm not sure what the text should be.
>>
>> I was going to try to fixup the "corrupt" message on default, but this  
>> seems to take care of that in a way that is suitable for stable.
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel@mercurial-scm.org
>> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Pierre-Yves David - April 16, 2016, 11:49 p.m.
On 04/13/2016 06:01 PM, Matt Harbison wrote:
> On Wed, 13 Apr 2016 02:09:08 -0400, Pierre-Yves David 
> <pierre-yves.david@ens-lyon.org> wrote:
>
>>
>>
>> On 04/11/2016 08:20 PM, Matt Harbison wrote:
>>> On Mon, 04 Apr 2016 03:50:49 -0400, Pierre-Yves David 
>>> <pierre-yves.david@ens-lyon.org> wrote:
>>>> On 04/03/2016 01:15 PM, Matt Harbison wrote:
>>>>> # HG changeset patch
>>>>> # User Matt Harbison <matt_harbison@yahoo.com>
>>>>> # Date 1459708994 14400
>>>>> #      Sun Apr 03 14:43:14 2016 -0400
>>>>> # Branch stable
>>>>> # Node ID 63ce31ee0e582bc25327b27c0584445b28a90686
>>>>> # Parent  2d39f987f0bacc39a8319cb6c84b2d38c1991028
>>>>> verify: don't init a new subrepo when a missing one is referenced 
>>>>> (issue5128)
>>>>>
>>>>> Initializing a subrepo when one doesn't exist is the right thing 
>>>>> to do when the
>>>>> parent is being updated, but in few other cases. Unfortunately, 
>>>>> there isn't
>>>>> enough context in the subrepo module to distinguish this case.  
>>>>> This same issue
>>>>> can be caused with other subrepo aware commands, so there is a 
>>>>> general issue
>>>>> here beyond the scope of this fix.
>>>>>
>>>>> A simpler attempt I tried was to add an '_updating' boolean to 
>>>>> localrepo, and
>>>>> set/clear it around the call to mergemod.update() in 
>>>>> hg.updaterepo().  That
>>>>> mostly worked, but doesn't handle the case where archive will 
>>>>> clone the subrepo
>>>>> if it is missing.  (I vaguely recall that there may be other 
>>>>> commands that will
>>>>> clone if needed like this, but certainly not all do.  It seems 
>>>>> both handy, and a
>>>>> bit surprising for what should be a read only operation. It might 
>>>>> be nice if
>>>>> all commands did this consistently, but we probably need Angel's 
>>>>> subrepo caching
>>>>> first, to not make a mess of the working directory.)
>>>>>
>>>>> It was suggested in the bug discussion to skip looking at the 
>>>>> subrepo links
>>>>> unless -S is specified.  I don't really like that idea because 
>>>>> missing a subrepo
>>>>> or (less likely, but worse) a corrupt .hgsubstate is a problem of 
>>>>> the parent
>>>>> repo when checking out a revision.  The -S option seems like a 
>>>>> better fit for
>>>>> functionality that would recurse into each subrepo and do a full 
>>>>> verification.
>>>>>
>>>>> Ultimately, the default value for 'allowcreate' should probably be 
>>>>> flipped, but
>>>>> since the default behavior was to allow creation, this is less 
>>>>> risky for now.
>>>>
>>>> I think we need at least 2 extra pieces here:
>>>>
>>>> First, I don't think we should silently skip the subrepo, we want 
>>>> at least a warning to tell the user we did not verify one of them.
>>>>
>>>> Second, we probably want a way to disable this and still check all 
>>>> subrepository (at the expense of creating the directory, for now).
>>>>
>>>> (note: I think I prefered the -S approach, but I'm fine with this 
>>>> one).
>>>
>>> Any objection to something like this?  (I'll fold and resubmit of 
>>> course):
>>>
>>> diff --git a/mercurial/hg.py b/mercurial/hg.py
>>> --- a/mercurial/hg.py
>>> +++ b/mercurial/hg.py
>>> @@ -828,7 +828,10 @@
>>>              ctx = repo[rev]
>>>              try:
>>>                  for subpath in ctx.substate:
>>> -                    ret = ctx.sub(subpath, 
>>> allowcreate=False).verify() or ret
>>> +                    try:
>>> +                        ret = ctx.sub(subpath, 
>>> allowcreate=False).verify() or ret
>>> +                    except Exception as e:
>>> +                        repo.ui.warn(_('%s: %s\n') % (rev, e))
>>
>> That is super wide, any way we could narrow this?
>>
>> On the other hand, this mean we would really be prepared for all. 
>> However, we probably want to let some Mercurial specific (Abort, etc) 
>> through.
>
> That's what I was thinking.  It looks like the specific error raised 
> for the missing + no create case here is RepoError.
>
> The issue with handling Abort here specially by just rethrowing it (or 
> any other narrowing), is that it will immediately be caught by the 
> existing handler wrapping this loop.  Since this is nested in a 
> revision loop, it will simply carry on to the next revision instead of 
> bailing.  And the existing handler does need to be that wide there, 
> because subrepo.state() raises several Aborts.  I guess we could throw 
> a subclass of Abort from subrepo.state(). Handle that and Exception by 
> logging and continuing, but other Aborts by bailing.  But that seems 
> hacky, and would probably raise eyebrows on stable?

We are already using sub class of abort in multiple place (eg HookAbort) 
so having a subclass here seems fine. Could be a bit too big for stable 
not not necessary. Let's build something and we'll decide if we put in 
in 3.8 or 3.9.

Patch

diff --git a/mercurial/hg.py b/mercurial/hg.py
--- a/mercurial/hg.py
+++ b/mercurial/hg.py
@@ -828,7 +828,10 @@ 
              ctx = repo[rev]
              try:
                  for subpath in ctx.substate:
-                    ret = ctx.sub(subpath, allowcreate=False).verify() or  
ret
+                    try:
+                        ret = ctx.sub(subpath,  
allowcreate=False).verify() or ret
+                    except Exception as e:
+                        repo.ui.warn(_('%s: %s\n') % (rev, e))
              except Exception:
                  repo.ui.warn(_('.hgsubstate is corrupt in revision %s\n')  
%
                               node.short(ctx.node()))
diff --git a/tests/test-subrepo-missing.t b/tests/test-subrepo-missing.t
--- a/tests/test-subrepo-missing.t
+++ b/tests/test-subrepo-missing.t
@@ -131,10 +131,10 @@ 
    checking files
    2 files, 5 changesets, 5 total revisions
    checking subrepo links
-  .hgsubstate is corrupt in revision ef278ff32036
-  .hgsubstate is corrupt in revision a66de08943b6
-  .hgsubstate is corrupt in revision 674d05939c1e
-  .hgsubstate is corrupt in revision a7d05d9055a4
+  0: repository $TESTTMP\repo\subrepo not found
+  1: repository $TESTTMP\repo\subrepo not found
+  3: repository $TESTTMP\repo\subrepo not found
+  4: repository $TESTTMP\repo\subrepo not found
    $ find subrepo
    find: `subrepo': * (glob)
    [1]