Patchwork [1,of,3] pull: add --subrepos flag

login
register
mail settings
Submitter Matt Harbison
Date Feb. 21, 2013, 4:55 a.m.
Message ID <5125A8BA.8020106@yahoo.com>
Download mbox | patch
Permalink /patch/1039/
State Superseded, archived
Headers show

Comments

Matt Harbison - Feb. 21, 2013, 4:55 a.m.
Angel Ezquerra wrote:
> On Wed, Feb 20, 2013 at 6:57 AM, Matt
> Harbison<matt_harbison@yahoo.com>  wrote:
>> On Sun, 17 Feb 2013 13:19:16 +0100, Angel Ezquerra wrote:
>>
>>> # HG changeset patch # User Angel
>>> Ezquerra<angel.ezquerra@gmail.com> # Date 1360519226 -3600 # Node
>>> ID abbd26cca35280fb8f784b3f2c02eef71696c47b # Parent
>>> 55b9b294b7544a6a144f627f71f4b770907d5a98 pull: add --subrepos
>>> flag
>>>
>>> The purpose of this new flag is to ensure that you are able to
>>> update to any incoming revision without requiring any network
>>> access. The idea is to make sure that the repository is
>>> self-contained after doing hg pull --subrepos, as long as it
>>> already was self-contained before the pull).
>>>
>>> When the --subrepos flag is enabled, pull will also pull (or
>>> clone) all subrepos that are present on the current revision and
>>> those that are referenced by any of the incoming revisions.
>> I haven't gotten a chance to really play with this yet, so I'm
>> going more off the comments here- I apologize if these answers
>> should be obvious, but I'm not familiar enough with some of the
>> code.
>>
>> - Is there an easy way to tell if the repo is/was self contained?
>> (Maybe incoming -S?)
>
> No there is not. I don't think incoming -S would do the trick since
> that would just tell you if there are _new_ incoming revisions on
> some of the _current_ subrepos. A repo is "self-contained" if it is
> possible to update to any of its revisions withing requiring a pull
> of one or more of its subrepos.
>
> I don't know of any existing mercurial command that would be able to
> give you that information.
>
>> - Is the 'self-contained' bit to limit overhead on each pull, or is
>> there another reason this can't ensure the result is self
>> contained?  'Push' and 'outgoing -S' recognize (almost) everything
>> going in the other direction, so it might be nice to have the same
>> capability with a form of pull.  (I may have found a push bug that
>> I haven't gotten back to yet.)
>
> I'm not sure I understand what you mean.

Consider this (contrived) case:

   1) push a repo and subrepo to remote (remote is now self contained)
   2) strip or rollback the remote subrepo
   3) a top level local repo push will repopulate the remote subrepo

Now reverse it:

   1) pull a repo and subrepo from remote (local is self contained)
   2) strip or rollback the local subrepo
   3) nothing is incoming top level, so the subrepo isn't repopulated (if
you've updated to a working dir without that subrepo).

I'm not sure if there's a less contrived case, or if this matters too
much.  I guess I was just wondering aloud about the symmetry between
push and pull -S (this is certainly much better than it was).

I realize you can't do such a thing without crawling most of the
history.  Are there large public repositories that use subrepos?  I'm
wondering what the performance hit would be.  (It's easy for me to think
something is a good idea when I only have small repos and wouldn't
notice the hit.)

FWIW, largefiles works the same way- if you don't clone or pull with
--all-largefiles, there's no single command to go back and get the files
for all revisions that are not incoming.  That leaves the user wondering
if they really can disconnect from the central repo.

> I don't think you (we?) must give too much importance to this
> "self-contained" concept. It is just a way for me to explain the
> purpose of the patch, and specially to explain why we must look for
> subrepos on all the new incoming revisions, and why we cannot just
> limit ourselves to pulling the subrepos on the current revisions
> (short answer: because new subrepos may appear on the new, incoming
> revisions).
>
> My patch explicitly says that hg pull -S will only make your subrepo
> self-contained if it was already self-contained before. This is in
> order to avoid having to look for subrepos on all the repo history,
> rather than just looking for subrepos on the incoming revision (and
> the current one).
>
>> - The full subrepo gets pulled, even revs not committed to the
>> parent?  I think that's a good thing, because regularly get burned
>> when I 'pull -u' the tree to another machine and then go to apply
>> the rest of a patch queue to the subrepo.
>
> Yes. It is perhaps not optimal but I think it is simpler. In
> addition if different parent repo revisions point to different
> revisions on a subrepo there is no way for us to tell which of those
> subrepo revisions is the one that is closes to tip, or which ones
> are ancestors of the other ones, etc. As a result we would need to
> perform as many pulls on a given repo as the number of different
> revisions of that subrepo that were referenced on the parent repo.
> That is complex and slow, so it is much simpler and possibly faster
> (in some cases at least) to just pull all revisions from each
> subrepo.

OK, I misread the code- I thought each subrepo was getting a pull at
each revision, which I figured would be slow.  I attached a test patch
below- there's nothing special about it, but it helped me with my pull
and outgoing changes (some comments probably still reflect this), so I
changed that to pull and incoming to test your patch.

- I think I see a double pull of a subrepo (search for "hg pull -S -r 3").

- I wonder if the code in this patch can be leveraged to make incoming
print all of the stuff 'pull -S' will grab in the future.

>> I'll try to experiment with this some in the next few days.  I ran
>> into issues with what I'm working on (push, outgoing) with deeply
>> nested subrepos, and also when a parent locks in an earlier subrepo
>> version.  I wonder if deeply nested subrepos will be a problem here
>> since hgsubrepo.pull() doesn't walk its subrepos and pull them.
>
> I must confess that I have not tried that too much. We should
> definitely do this recursively. That being said I hope to get some
> feedback on the current version that I sent to the list first.

Sorry, I got crossed up on that too.  hgsubrepo.pull() ends up calling
_repo.pull(), so it does recurse.  The test below indicates that clone 
won't recurse- it reminds that an update is needed.  (Maybe clone needs 
a -S too as part of these changes?  If you aren't walking the history, I
don't see a way around that because nothing is incoming after a clone,
so you won't see subrepos of a cloned subrepo.)

The other thing worth a test is largefiles- the --all-largefiles
option isn't passed to subrepos, so as it stands, 'pull -S' won't let
you really disconnect, because largefiles in subrepos won't be cached. 
That can be fixed later.

> Cheers,
>
> Angel
>

# HG changeset patch
# Parent 5515fe1a8cbf0097440d327e901ddcaa7f1afa04
# User Matt Harbison <matt_harbison@yahoo.com>
# Date 1361421052 18000

tests: adapt various 'push -S -r' tests to 'pull -S'

+  0:5879cf081918 standalone subrepo commit
+
+And finally the legacy push does a full subrepo push.  It seems like
+the exit code might be wrong given that it pushes a subrepo.
+  $ hg pull -S   --config paths.default=. -R ../dest
+  pulling from $TESTTMP\src
+  searching for changes
+  no changes found
Angel Ezquerra - Feb. 22, 2013, 6:28 a.m.
On Thu, Feb 21, 2013 at 5:55 AM, Matt Harbison <matt_harbison@yahoo.com> wrote:
> Angel Ezquerra wrote:
>>
>> On Wed, Feb 20, 2013 at 6:57 AM, Matt
>> Harbison<matt_harbison@yahoo.com>  wrote:
>>>
>>> On Sun, 17 Feb 2013 13:19:16 +0100, Angel Ezquerra wrote:
>>>
>>>> # HG changeset patch # User Angel
>>>> Ezquerra<angel.ezquerra@gmail.com> # Date 1360519226 -3600 # Node
>>>> ID abbd26cca35280fb8f784b3f2c02eef71696c47b # Parent
>>>> 55b9b294b7544a6a144f627f71f4b770907d5a98 pull: add --subrepos
>>>> flag
>>>>
>>>> The purpose of this new flag is to ensure that you are able to
>>>> update to any incoming revision without requiring any network
>>>> access. The idea is to make sure that the repository is
>>>> self-contained after doing hg pull --subrepos, as long as it
>>>> already was self-contained before the pull).
>>>>
>>>> When the --subrepos flag is enabled, pull will also pull (or
>>>> clone) all subrepos that are present on the current revision and
>>>> those that are referenced by any of the incoming revisions.
>>>
>>> I haven't gotten a chance to really play with this yet, so I'm
>>> going more off the comments here- I apologize if these answers
>>> should be obvious, but I'm not familiar enough with some of the
>>> code.
>>>
>>> - Is there an easy way to tell if the repo is/was self contained?
>>> (Maybe incoming -S?)
>>
>>
>> No there is not. I don't think incoming -S would do the trick since
>> that would just tell you if there are _new_ incoming revisions on
>> some of the _current_ subrepos. A repo is "self-contained" if it is
>> possible to update to any of its revisions withing requiring a pull
>> of one or more of its subrepos.
>>
>> I don't know of any existing mercurial command that would be able to
>> give you that information.
>>
>>> - Is the 'self-contained' bit to limit overhead on each pull, or is
>>> there another reason this can't ensure the result is self
>>> contained?  'Push' and 'outgoing -S' recognize (almost) everything
>>> going in the other direction, so it might be nice to have the same
>>> capability with a form of pull.  (I may have found a push bug that
>>> I haven't gotten back to yet.)
>>
>>
>> I'm not sure I understand what you mean.
>
>
> Consider this (contrived) case:
>
>   1) push a repo and subrepo to remote (remote is now self contained)
>   2) strip or rollback the remote subrepo
>   3) a top level local repo push will repopulate the remote subrepo
>
> Now reverse it:
>
>   1) pull a repo and subrepo from remote (local is self contained)
>   2) strip or rollback the local subrepo
>   3) nothing is incoming top level, so the subrepo isn't repopulated (if
> you've updated to a working dir without that subrepo).
>
> I'm not sure if there's a less contrived case, or if this matters too
> much.  I guess I was just wondering aloud about the symmetry between
> push and pull -S (this is certainly much better than it was).

I don't think this is a very big deal because "pull --subrepos" also
performs a pull on all the repositories that are found on the current
revision. I think this would fix your contrived case.

The only corner case that would not be properly handled by this would
be the case in which pull --subrepos cloned a _new_ subrepo that is
not found on the current revision (i.e. that is introduced on one of
the cloned revisions) and then you striped some revisions from that
"future subrepo". In that case would would still be able to get the
missing revisions as you would do today, i.e. by updating to the
offending revision. You could also use the onsub extension (which is
shipped with TortoiseHg).

BTW, the more I use the onsub extension the more I am convinced that
it should be distributed with mercurial, and even that the onsub
command itself should become a built-in command.

That being said, if we did want to also handle this very small corner
case there are at least a couple of possible solutions:

1. make the --subrepos flag take a revset that would tell it which
revisions to look for subrepos to pull.
2. make --subrepos also look for subrepos on the descendants of the
current revision.

#1 would be nice _if_ we could also skip the revset to get the
behavior that the current patch proposes. However I don't think that
is possible to have a flag that can optionally take a parameter.

#2... maybe that is too much overhead for such a small corner case?

Additionally, I wonder if it would make sense to change the current
pull behavior a little, by checking if any of the subrepos that are
present on the _current_ parent revision point to a revision that does
not exist, and if that is the case fulling from them _up to that
revision_? I think the parent pull time is an excellent time to do
this kind of stuff since you already know that you have an internet
connection...

> I realize you can't do such a thing without crawling most of the
> history.  Are there large public repositories that use subrepos?  I'm
> wondering what the performance hit would be.  (It's easy for me to think
> something is a good idea when I only have small repos and wouldn't
> notice the hit.)

We have repositories with thousands of files and more than 20
subrepos. I don't know if you would consider that big...

> FWIW, largefiles works the same way- if you don't clone or pull with
> --all-largefiles, there's no single command to go back and get the files
> for all revisions that are not incoming.  That leaves the user wondering
> if they really can disconnect from the central repo.

That is true. It would be nice to be able to do that. However there is
a difference between largefiles and subrepos. Largefiles tries to only
get the files that you need, when you need them _by design_. Subrepos
is not meant (IMHO) to do that (or at least it is not necessary for
subrepos to fulfill its main purpose, which is to track "submodules").

>> I don't think you (we?) must give too much importance to this
>> "self-contained" concept. It is just a way for me to explain the
>> purpose of the patch, and specially to explain why we must look for
>> subrepos on all the new incoming revisions, and why we cannot just
>> limit ourselves to pulling the subrepos on the current revisions
>> (short answer: because new subrepos may appear on the new, incoming
>> revisions).
>>
>> My patch explicitly says that hg pull -S will only make your subrepo
>> self-contained if it was already self-contained before. This is in
>> order to avoid having to look for subrepos on all the repo history,
>> rather than just looking for subrepos on the incoming revision (and
>> the current one).
>>
>>> - The full subrepo gets pulled, even revs not committed to the
>>> parent?  I think that's a good thing, because regularly get burned
>>> when I 'pull -u' the tree to another machine and then go to apply
>>> the rest of a patch queue to the subrepo.
>>
>>
>> Yes. It is perhaps not optimal but I think it is simpler. In
>> addition if different parent repo revisions point to different
>> revisions on a subrepo there is no way for us to tell which of those
>> subrepo revisions is the one that is closes to tip, or which ones
>> are ancestors of the other ones, etc. As a result we would need to
>> perform as many pulls on a given repo as the number of different
>> revisions of that subrepo that were referenced on the parent repo.
>> That is complex and slow, so it is much simpler and possibly faster
>> (in some cases at least) to just pull all revisions from each
>> subrepo.
>
>
> OK, I misread the code- I thought each subrepo was getting a pull at
> each revision, which I figured would be slow.  I attached a test patch
> below- there's nothing special about it, but it helped me with my pull
> and outgoing changes (some comments probably still reflect this), so I
> changed that to pull and incoming to test your patch.
>
> - I think I see a double pull of a subrepo (search for "hg pull -S -r 3").

You are right. There is a problem with the current version of the
patch. It will clone the same repo multiple times if the same subrepo
refers to different revisions. I will resend with a fix.

BTW, where do you think the tests for this new feature should go?
Should it go into one of the test-pull*.t files or in one of the
test-subrepo*.t files? Do you think the test that you propose in that
patch would be enough to test this new feature? Maybe it could be made
a bit simpler?

> - I wonder if the code in this patch can be leveraged to make incoming
> print all of the stuff 'pull -S' will grab in the future.

In theory that would be certainly possible. At least we could run the
same "look for subrepos to pull" code and show the list of subrepos
that would be pulled and then run incoming on them. The main problem
is that incoming already has a --subrepos flag, which does something
slightly different... I'm not sure that could be changed...

>>> I'll try to experiment with this some in the next few days.  I ran
>>> into issues with what I'm working on (push, outgoing) with deeply
>>> nested subrepos, and also when a parent locks in an earlier subrepo
>>> version.  I wonder if deeply nested subrepos will be a problem here
>>> since hgsubrepo.pull() doesn't walk its subrepos and pull them.
>>
>>
>> I must confess that I have not tried that too much. We should
>> definitely do this recursively. That being said I hope to get some
>> feedback on the current version that I sent to the list first.
>
>
> Sorry, I got crossed up on that too.  hgsubrepo.pull() ends up calling
> _repo.pull(), so it does recurse.

You are right. That's nice! :-)

> The test below indicates that clone won't
> recurse- it reminds that an update is needed.  (Maybe clone needs a -S too
> as part of these changes?  If you aren't walking the history, I
> don't see a way around that because nothing is incoming after a clone,
> so you won't see subrepos of a cloned subrepo.)

I guess that would make sense in another patch.

> The other thing worth a test is largefiles- the --all-largefiles
> option isn't passed to subrepos, so as it stands, 'pull -S' won't let
> you really disconnect, because largefiles in subrepos won't be cached. That
> can be fixed later.

I agree. We do not really use largefiles (yet) so we have not come
across this (yet) but I'm sure this is something we would like to
improve as well.

One last thing, did you have time to have a look at the code of the
patch series itself? I plan to send an slighly changed version of the
patch series that will only change the first patch to fix the issue
you found (the fix is minimal so the code should be almost the same).
Do you have any comments?

Thanks for looking into this.

Angel
Matt Harbison - Feb. 24, 2013, 8:20 a.m.
Angel Ezquerra wrote:
> On Thu, Feb 21, 2013 at 5:55 AM, Matt Harbison<matt_harbison@yahoo.com>  wrote:
>> Angel Ezquerra wrote:
>>> On Wed, Feb 20, 2013 at 6:57 AM, Matt
>>> Harbison<matt_harbison@yahoo.com>   wrote:
>>>> On Sun, 17 Feb 2013 13:19:16 +0100, Angel Ezquerra wrote:
>>>>
>>>>> # HG changeset patch # User Angel
>>>>> Ezquerra<angel.ezquerra@gmail.com>  # Date 1360519226 -3600 # Node
>>>>> ID abbd26cca35280fb8f784b3f2c02eef71696c47b # Parent
>>>>> 55b9b294b7544a6a144f627f71f4b770907d5a98 pull: add --subrepos
>>>>> flag
>>>>>
>>>>> The purpose of this new flag is to ensure that you are able to
>>>>> update to any incoming revision without requiring any network
>>>>> access. The idea is to make sure that the repository is
>>>>> self-contained after doing hg pull --subrepos, as long as it
>>>>> already was self-contained before the pull).
>>>>>
>>>>> When the --subrepos flag is enabled, pull will also pull (or
>>>>> clone) all subrepos that are present on the current revision and
>>>>> those that are referenced by any of the incoming revisions.
>>>> I haven't gotten a chance to really play with this yet, so I'm
>>>> going more off the comments here- I apologize if these answers
>>>> should be obvious, but I'm not familiar enough with some of the
>>>> code.
>>>>
>>>> - Is there an easy way to tell if the repo is/was self contained?
>>>> (Maybe incoming -S?)
>>>
>>> No there is not. I don't think incoming -S would do the trick since
>>> that would just tell you if there are _new_ incoming revisions on
>>> some of the _current_ subrepos. A repo is "self-contained" if it is
>>> possible to update to any of its revisions withing requiring a pull
>>> of one or more of its subrepos.
>>>
>>> I don't know of any existing mercurial command that would be able to
>>> give you that information.
>>>
>>>> - Is the 'self-contained' bit to limit overhead on each pull, or is
>>>> there another reason this can't ensure the result is self
>>>> contained?  'Push' and 'outgoing -S' recognize (almost) everything
>>>> going in the other direction, so it might be nice to have the same
>>>> capability with a form of pull.  (I may have found a push bug that
>>>> I haven't gotten back to yet.)
>>>
>>> I'm not sure I understand what you mean.
>>
>> Consider this (contrived) case:
>>
>>    1) push a repo and subrepo to remote (remote is now self contained)
>>    2) strip or rollback the remote subrepo
>>    3) a top level local repo push will repopulate the remote subrepo
>>
>> Now reverse it:
>>
>>    1) pull a repo and subrepo from remote (local is self contained)
>>    2) strip or rollback the local subrepo
>>    3) nothing is incoming top level, so the subrepo isn't repopulated (if
>> you've updated to a working dir without that subrepo).
>>
>> I'm not sure if there's a less contrived case, or if this matters too
>> much.  I guess I was just wondering aloud about the symmetry between
>> push and pull -S (this is certainly much better than it was).
>
> I don't think this is a very big deal because "pull --subrepos" also
> performs a pull on all the repositories that are found on the current
> revision. I think this would fix your contrived case.
 >
> The only corner case that would not be properly handled by this would
> be the case in which pull --subrepos cloned a _new_ subrepo that is
> not found on the current revision (i.e. that is introduced on one of
> the cloned revisions) and then you striped some revisions from that
> "future subrepo". In that case would would still be able to get the
> missing revisions as you would do today, i.e. by updating to the
> offending revision. You could also use the onsub extension (which is
> shipped with TortoiseHg).
>
> BTW, the more I use the onsub extension the more I am convinced that
> it should be distributed with mercurial, and even that the onsub
> command itself should become a built-in command.

I'd be fine with that.  I haven't used it, but from what I've read, it seems 
like a useful crutch to do things that aren't currently possible because they 
aren't well defined, like tagging all repos or opening a branch.  But ideally 
there wouldn't be a need for it with commands that accept a -S.  The principle 
of least surprise says those commands can handle subrepos already.

> That being said, if we did want to also handle this very small corner
> case there are at least a couple of possible solutions:
>
> 1. make the --subrepos flag take a revset that would tell it which
> revisions to look for subrepos to pull.

Why not use the -r option?  But really, I think in this case 'pull -S' == 'pull 
-S -r all()', so I'm not sure if that helps.

> 2. make --subrepos also look for subrepos on the descendants of the
> current revision.
>
> #1 would be nice _if_ we could also skip the revset to get the
> behavior that the current patch proposes. However I don't think that
> is possible to have a flag that can optionally take a parameter.

Agreed.  A special case parameter isn't nice.

> #2... maybe that is too much overhead for such a small corner case?

Not sure.. That's why I was asking if there's large repos to benchmark on :-). 
The other thing I'm wondering is, since we seem to have various caches, maybe 
caching subrepo info would speed things up?  I have no idea what form this would 
take, or even how the existing caches are invalidated, detect corruption, etc. 
The construct I've got in my push patches is:

      f = repo.wjoin('.hgsubstate')
      revs = repo.revs('(adds(%s) or modifies(%s)) and ::(%ln)', f, f, revs)

I'm not sure if this helps cut down overhead- you do have to traverse the whole 
repo, but you don't have to process the .hgsubstate file for revisions where the 
subrepos don't change.  (Though you referenced having the latest path for the 
repo, and that can change without affecting .hgsubstate, so this may not be 
useful anyway.)


> Additionally, I wonder if it would make sense to change the current
> pull behavior a little, by checking if any of the subrepos that are
> present on the _current_ parent revision point to a revision that does
> not exist, and if that is the case fulling from them _up to that
> revision_? I think the parent pull time is an excellent time to do
> this kind of stuff since you already know that you have an internet
> connection...

Can you show a short test case or pseudo test case of this?  I'm not sure what 
you're suggesting here.  Wouldn't the existing pull bring in all of the changes 
to the tip of each subrepo now?  Why back off on that without a -r option?

>> I realize you can't do such a thing without crawling most of the
>> history.  Are there large public repositories that use subrepos?  I'm
>> wondering what the performance hit would be.  (It's easy for me to think
>> something is a good idea when I only have small repos and wouldn't
>> notice the hit.)
>
> We have repositories with thousands of files and more than 20
> subrepos. I don't know if you would consider that big...

Not sure.  I'm assuming that the number of commits is the governing factor when 
walking the history?  But that's almost certainly going to be larger than the 
toy repos I make, or even the real ones I use at work (still less than 1000 in 
the subrepo, and probably less than 200 on the parent).  Just looking for 
something to run benchmarks on.

>> FWIW, largefiles works the same way- if you don't clone or pull with
>> --all-largefiles, there's no single command to go back and get the files
>> for all revisions that are not incoming.  That leaves the user wondering
>> if they really can disconnect from the central repo.
>
> That is true. It would be nice to be able to do that. However there is
> a difference between largefiles and subrepos. Largefiles tries to only
> get the files that you need, when you need them _by design_. Subrepos
> is not meant (IMHO) to do that (or at least it is not necessary for
> subrepos to fulfill its main purpose, which is to track "submodules").

I agree with you about subrepos, but the --all-largefiles flag is a new(ish) 
feature that lets you override that original largefiles design, so you can 
disconnect from the central repo.  (So a very similar motivation for this.)  I 
was going to put in a command to download missing largefiles, but never got 
around to it.  But I don't think a new command would be an option for subrepos 
if we don't get this right, because it really is just a pull.  If we can change 
the pull command in the future if there's a need to handle this, then it may not 
be that big of a deal now.  And I suspect the only hinderance to adding it later 
will be if there's a big performance drop.

>>> I don't think you (we?) must give too much importance to this
>>> "self-contained" concept. It is just a way for me to explain the
>>> purpose of the patch, and specially to explain why we must look for
>>> subrepos on all the new incoming revisions, and why we cannot just
>>> limit ourselves to pulling the subrepos on the current revisions
>>> (short answer: because new subrepos may appear on the new, incoming
>>> revisions).
>>>
>>> My patch explicitly says that hg pull -S will only make your subrepo
>>> self-contained if it was already self-contained before. This is in
>>> order to avoid having to look for subrepos on all the repo history,
>>> rather than just looking for subrepos on the incoming revision (and
>>> the current one).
>>>
>>>> - The full subrepo gets pulled, even revs not committed to the
>>>> parent?  I think that's a good thing, because regularly get burned
>>>> when I 'pull -u' the tree to another machine and then go to apply
>>>> the rest of a patch queue to the subrepo.
>>>
>>> Yes. It is perhaps not optimal but I think it is simpler. In
>>> addition if different parent repo revisions point to different
>>> revisions on a subrepo there is no way for us to tell which of those
>>> subrepo revisions is the one that is closes to tip, or which ones
>>> are ancestors of the other ones, etc. As a result we would need to
>>> perform as many pulls on a given repo as the number of different
>>> revisions of that subrepo that were referenced on the parent repo.
>>> That is complex and slow, so it is much simpler and possibly faster
>>> (in some cases at least) to just pull all revisions from each
>>> subrepo.
>>
>> OK, I misread the code- I thought each subrepo was getting a pull at
>> each revision, which I figured would be slow.  I attached a test patch
>> below- there's nothing special about it, but it helped me with my pull
>> and outgoing changes (some comments probably still reflect this), so I
>> changed that to pull and incoming to test your patch.
>>
>> - I think I see a double pull of a subrepo (search for "hg pull -S -r 3").
>
> You are right. There is a problem with the current version of the
> patch. It will clone the same repo multiple times if the same subrepo
> refers to different revisions. I will resend with a fix.
 >
> BTW, where do you think the tests for this new feature should go?
> Should it go into one of the test-pull*.t files or in one of the
> test-subrepo*.t files? Do you think the test that you propose in that
> patch would be enough to test this new feature? Maybe it could be made
> a bit simpler?
>

I assumed it would go in test-subrepo*t, so anyone hacking on subrepos doesn't 
have to find it in an obscure test, but I can see the same argument from someone 
working on pull().  Probably what tips it to subrepo*t though is that there is 
some existing subrepo infrastructure that doesn't exist in test-pull*t.

I'm not sure that it's comprehensive, and there's probably some duplication.  I 
put it together to e.g. test 'push -r 2' then 'push -r 4', and make sure the 
grandchild repo code path got exercised.  Since you aren't doing anything with 
-r, there may be duplicate pull tests.  The incoming tests probably aren't 
relevant either, though I think they should agree on what they are doing eventually.

Those tests also didn't exercise the subrepo pull path (the dests were empty, so 
clone was used).  I marked various potential issues in there with 'XXX'.
One thing I did specifically was to skew the rev values between the parent and 
child, so parent rev 1 doesn't lock in subrepo rev 1.  This makes it easier to 
test with the -r option (which I realize you aren't doing now, but might be a 
nice to have in the future).

>> - I wonder if the code in this patch can be leveraged to make incoming
>> print all of the stuff 'pull -S' will grab in the future.
>
> In theory that would be certainly possible. At least we could run the
> same "look for subrepos to pull" code and show the list of subrepos
> that would be pulled and then run incoming on them. The main problem
> is that incoming already has a --subrepos flag, which does something
> slightly different... I'm not sure that could be changed...

I think it just recurses into the subrepo and does an incoming on it.  If it 
does something else, it's too subtle for me to have noticed.

I've probably stated bits and pieces of this over the course of a week or so, 
but just to explicitly put it all together, I think in an ideal world we would 
have new pull/incoming, and push/outgoing agree with the same options, and each 
group is the reverse of the other.  It may be fantasy, but:

    incoming       pull          operation
    <no arg>       <no arg>      current behavior, parent repo only
    xxx [1]        -u            current behavior, pull subrepo on update
    -S             -S            (this patch) pull parent + all subrepos to tip
    -S -r <rev>    -S -r <rev>   (future) pull parent to rev, revs of subrepos
                                 used up to that rev

[1] It doesn't seem sensible to add -u to incoming for this, so leaving no 
equivalent form is probably best.

    outgoing       push          operation
    <no arg> [2]   <no arg>      current, parent only and all respectively
    -S             <no arg> [3]  current, push parent + all subrepos to tip
    -S -r <rev>    -r <rev>      (future) push parent to rev, revs of subrepos
                                 used up to that rev

[2] It seems buggy to me that the no arg forms disagree, since outgoing's help 
says "These are the changesets that would be pushed if a push was requested." 
But that ship has probably already sailed.

[3] There is no form of push that takes -S

So the last two lines in each table align in functionality and (roughly) in 
arguments.


>>>> I'll try to experiment with this some in the next few days.  I ran
>>>> into issues with what I'm working on (push, outgoing) with deeply
>>>> nested subrepos, and also when a parent locks in an earlier subrepo
>>>> version.  I wonder if deeply nested subrepos will be a problem here
>>>> since hgsubrepo.pull() doesn't walk its subrepos and pull them.
>>>
>>> I must confess that I have not tried that too much. We should
>>> definitely do this recursively. That being said I hope to get some
>>> feedback on the current version that I sent to the list first.
>>
>> Sorry, I got crossed up on that too.  hgsubrepo.pull() ends up calling
>> _repo.pull(), so it does recurse.
>
> You are right. That's nice! :-)
>
>> The test below indicates that clone won't
>> recurse- it reminds that an update is needed.  (Maybe clone needs a -S too
>> as part of these changes?  If you aren't walking the history, I
>> don't see a way around that because nothing is incoming after a clone,
>> so you won't see subrepos of a cloned subrepo.)
>
> I guess that would make sense in another patch.
>
>> The other thing worth a test is largefiles- the --all-largefiles
>> option isn't passed to subrepos, so as it stands, 'pull -S' won't let
>> you really disconnect, because largefiles in subrepos won't be cached. That
>> can be fixed later.
>
> I agree. We do not really use largefiles (yet) so we have not come
> across this (yet) but I'm sure this is something we would like to
> improve as well.
>
> One last thing, did you have time to have a look at the code of the
> patch series itself? I plan to send an slighly changed version of the
> patch series that will only change the first patch to fix the issue
> you found (the fix is minimal so the code should be almost the same).
> Do you have any comments?

I did, but I'm not as well versed in this area of code- that's why I've kept the 
conversation fairly high level.  I was able to follow your commit message and it 
looks like the code does what it says.  Just curious- is specifying None as the 
revision the right thing to do in pull()?  AFAICT, None gives you the working 
directory... so does that change the results based on what the subrepo is 
currently updated to?

> Thanks for looking into this.
>
> Angel
>
Angel Ezquerra - Feb. 28, 2013, 5:02 p.m.
On Sun, Feb 24, 2013 at 9:20 AM, Matt Harbison <matt_harbison@yahoo.com> wrote:
> Angel Ezquerra wrote:
>>
>> On Thu, Feb 21, 2013 at 5:55 AM, Matt Harbison<matt_harbison@yahoo.com>
>> wrote:
>>>
>>> Angel Ezquerra wrote:
>>>>
>>>> On Wed, Feb 20, 2013 at 6:57 AM, Matt
>>>> Harbison<matt_harbison@yahoo.com>   wrote:
>>>>>
>>>>> On Sun, 17 Feb 2013 13:19:16 +0100, Angel Ezquerra wrote:
>>>>>
>>>>>> # HG changeset patch # User Angel
>>>>>> Ezquerra<angel.ezquerra@gmail.com>  # Date 1360519226 -3600 # Node
>>>>>> ID abbd26cca35280fb8f784b3f2c02eef71696c47b # Parent
>>>>>> 55b9b294b7544a6a144f627f71f4b770907d5a98 pull: add --subrepos
>>>>>> flag
>>>>>>
>>>>>> The purpose of this new flag is to ensure that you are able to
>>>>>> update to any incoming revision without requiring any network
>>>>>> access. The idea is to make sure that the repository is
>>>>>> self-contained after doing hg pull --subrepos, as long as it
>>>>>> already was self-contained before the pull).
>>>>>>
>>>>>> When the --subrepos flag is enabled, pull will also pull (or
>>>>>> clone) all subrepos that are present on the current revision and
>>>>>> those that are referenced by any of the incoming revisions.
>>>>>
>>>>> I haven't gotten a chance to really play with this yet, so I'm
>>>>> going more off the comments here- I apologize if these answers
>>>>> should be obvious, but I'm not familiar enough with some of the
>>>>> code.
>>>>>
>>>>> - Is there an easy way to tell if the repo is/was self contained?
>>>>> (Maybe incoming -S?)
>>>>
>>>>
>>>> No there is not. I don't think incoming -S would do the trick since
>>>> that would just tell you if there are _new_ incoming revisions on
>>>> some of the _current_ subrepos. A repo is "self-contained" if it is
>>>> possible to update to any of its revisions withing requiring a pull
>>>> of one or more of its subrepos.
>>>>
>>>> I don't know of any existing mercurial command that would be able to
>>>> give you that information.
>>>>
>>>>> - Is the 'self-contained' bit to limit overhead on each pull, or is
>>>>> there another reason this can't ensure the result is self
>>>>> contained?  'Push' and 'outgoing -S' recognize (almost) everything
>>>>> going in the other direction, so it might be nice to have the same
>>>>> capability with a form of pull.  (I may have found a push bug that
>>>>> I haven't gotten back to yet.)
>>>>
>>>>
>>>> I'm not sure I understand what you mean.
>>>
>>>
>>> Consider this (contrived) case:
>>>
>>>    1) push a repo and subrepo to remote (remote is now self contained)
>>>    2) strip or rollback the remote subrepo
>>>    3) a top level local repo push will repopulate the remote subrepo
>>>
>>> Now reverse it:
>>>
>>>    1) pull a repo and subrepo from remote (local is self contained)
>>>    2) strip or rollback the local subrepo
>>>    3) nothing is incoming top level, so the subrepo isn't repopulated (if
>>> you've updated to a working dir without that subrepo).
>>>
>>> I'm not sure if there's a less contrived case, or if this matters too
>>> much.  I guess I was just wondering aloud about the symmetry between
>>> push and pull -S (this is certainly much better than it was).
>>
>>
>> I don't think this is a very big deal because "pull --subrepos" also
>> performs a pull on all the repositories that are found on the current
>> revision. I think this would fix your contrived case.
>
>>
>>
>> The only corner case that would not be properly handled by this would
>> be the case in which pull --subrepos cloned a _new_ subrepo that is
>> not found on the current revision (i.e. that is introduced on one of
>> the cloned revisions) and then you striped some revisions from that
>> "future subrepo". In that case would would still be able to get the
>> missing revisions as you would do today, i.e. by updating to the
>> offending revision. You could also use the onsub extension (which is
>> shipped with TortoiseHg).
>>
>> BTW, the more I use the onsub extension the more I am convinced that
>> it should be distributed with mercurial, and even that the onsub
>> command itself should become a built-in command.
>
>
> I'd be fine with that.  I haven't used it, but from what I've read, it seems
> like a useful crutch to do things that aren't currently possible because
> they aren't well defined, like tagging all repos or opening a branch.  But
> ideally there wouldn't be a need for it with commands that accept a -S.  The
> principle of least surprise says those commands can handle subrepos already.

The cool thing about the onsub command is that it lets you operate on
subrepos without operating on the parent repo, which often is what you
want. Using the -S option on mercurial commands does not work the same
because it (usually) will also perform an action on the parent repo.

>> That being said, if we did want to also handle this very small corner
>> case there are at least a couple of possible solutions:
>>
>> 1. make the --subrepos flag take a revset that would tell it which
>> revisions to look for subrepos to pull.
>
>
> Why not use the -r option?  But really, I think in this case 'pull -S' ==
> 'pull -S -r all()', so I'm not sure if that helps.

I think it could help a little. We must just make sure that 'pull -S
-r .' does what I would expect, which is to just pull all current
subrepos.

>> 2. make --subrepos also look for subrepos on the descendants of the
>> current revision.
>>
>> #1 would be nice _if_ we could also skip the revset to get the
>> behavior that the current patch proposes. However I don't think that
>> is possible to have a flag that can optionally take a parameter.
>
> Agreed.  A special case parameter isn't nice.

I guess if this really became an issue we could think about an
--all-subrepos options, as in large files, but I don't really think it
will be necessary. Also perhaps using pull -r 0:tip would work. I'll
have to check.

>> #2... maybe that is too much overhead for such a small corner case?
>
> Not sure.. That's why I was asking if there's large repos to benchmark on
> :-). The other thing I'm wondering is, since we seem to have various caches,
> maybe caching subrepo info would speed things up?  I have no idea what form
> this would take, or even how the existing caches are invalidated, detect
> corruption, etc. The construct I've got in my push patches is:
>
>      f = repo.wjoin('.hgsubstate')
>      revs = repo.revs('(adds(%s) or modifies(%s)) and ::(%ln)', f, f, revs)
>
> I'm not sure if this helps cut down overhead- you do have to traverse the
> whole repo, but you don't have to process the .hgsubstate file for revisions
> where the subrepos don't change.  (Though you referenced having the latest
> path for the repo, and that can change without affecting .hgsubstate, so
> this may not be useful anyway.)

I think the code could easily check if the parent repo refers to a
revision that is missing on the subrepo and then only pull those
subrepos that were missing revisions (or new subrepos). However this
would make it hard to just pull all subrepos, which you need to do
when you want to move a subrepo forward (i.e. select a newer revision
on a subrepo).

My ideal pull behavior would be for pull without --subrepos to
actually perform this check and pull the minimum amount of revisions
needed to make the current parent repo revision consistent. Adding
--subrepos would pull all current and new subrepos (as this patch
proposes), regardless of whether they have changed.

That being said, I am not proposing (yet) to change the current pull
(without --subrepos) behavior. I first want to add the --subrepos flag
and make it do something useful.

>> Additionally, I wonder if it would make sense to change the current
>> pull behavior a little, by checking if any of the subrepos that are
>> present on the _current_ parent revision point to a revision that does
>> not exist, and if that is the case fulling from them _up to that
>> revision_? I think the parent pull time is an excellent time to do
>> this kind of stuff since you already know that you have an internet
>> connection...
>
>
> Can you show a short test case or pseudo test case of this?  I'm not sure
> what you're suggesting here.  Wouldn't the existing pull bring in all of the
> changes to the tip of each subrepo now?  Why back off on that without a -r
> option?

Umm, what I had in mind was the following scenario:

1. You pull a repo that has some subrepos.
2. You update to tip, which pulls all subrepos.
3. You enter into one of the subrepos
4. You strip some revisions on that subrepo.

How do you get those missing revisions back?

However I've been thinking about this and it is not really necessary.
You can just do "hg update --clean -r ." and that will pull from all
subrepos that are missing revisions, which is what I wanted.

So, never mind... Just forget about that :-)

>>> I realize you can't do such a thing without crawling most of the
>>> history.  Are there large public repositories that use subrepos?  I'm
>>> wondering what the performance hit would be.  (It's easy for me to think
>>> something is a good idea when I only have small repos and wouldn't
>>> notice the hit.)
>>
>>
>> We have repositories with thousands of files and more than 20
>> subrepos. I don't know if you would consider that big...
>
>
> Not sure.  I'm assuming that the number of commits is the governing factor
> when walking the history?  But that's almost certainly going to be larger
> than the toy repos I make, or even the real ones I use at work (still less
> than 1000 in the subrepo, and probably less than 200 on the parent).  Just
> looking for something to run benchmarks on.

I think fully pulling all subrepos when you explicitly ask to by using
the --subrepos flag would be acceptable in any case.

>>> FWIW, largefiles works the same way- if you don't clone or pull with
>>> --all-largefiles, there's no single command to go back and get the files
>>> for all revisions that are not incoming.  That leaves the user wondering
>>> if they really can disconnect from the central repo.
>>
>>
>> That is true. It would be nice to be able to do that. However there is
>> a difference between largefiles and subrepos. Largefiles tries to only
>> get the files that you need, when you need them _by design_. Subrepos
>> is not meant (IMHO) to do that (or at least it is not necessary for
>> subrepos to fulfill its main purpose, which is to track "submodules").
>
>
> I agree with you about subrepos, but the --all-largefiles flag is a new(ish)
> feature that lets you override that original largefiles design, so you can
> disconnect from the central repo.  (So a very similar motivation for this.)
> I was going to put in a command to download missing largefiles, but never
> got around to it.  But I don't think a new command would be an option for
> subrepos if we don't get this right, because it really is just a pull.  If
> we can change the pull command in the future if there's a need to handle
> this, then it may not be that big of a deal now.  And I suspect the only
> hinderance to adding it later will be if there's a big performance drop.

If this became a problem we could think about adding some other flag
in the future, or perhaps integrating the onsub extension into core?

>>>> I don't think you (we?) must give too much importance to this
>>>> "self-contained" concept. It is just a way for me to explain the
>>>> purpose of the patch, and specially to explain why we must look for
>>>> subrepos on all the new incoming revisions, and why we cannot just
>>>> limit ourselves to pulling the subrepos on the current revisions
>>>> (short answer: because new subrepos may appear on the new, incoming
>>>> revisions).
>>>>
>>>> My patch explicitly says that hg pull -S will only make your subrepo
>>>> self-contained if it was already self-contained before. This is in
>>>> order to avoid having to look for subrepos on all the repo history,
>>>> rather than just looking for subrepos on the incoming revision (and
>>>> the current one).
>>>>
>>>>> - The full subrepo gets pulled, even revs not committed to the
>>>>> parent?  I think that's a good thing, because regularly get burned
>>>>> when I 'pull -u' the tree to another machine and then go to apply
>>>>> the rest of a patch queue to the subrepo.
>>>>
>>>>
>>>> Yes. It is perhaps not optimal but I think it is simpler. In
>>>> addition if different parent repo revisions point to different
>>>> revisions on a subrepo there is no way for us to tell which of those
>>>> subrepo revisions is the one that is closes to tip, or which ones
>>>> are ancestors of the other ones, etc. As a result we would need to
>>>> perform as many pulls on a given repo as the number of different
>>>> revisions of that subrepo that were referenced on the parent repo.
>>>> That is complex and slow, so it is much simpler and possibly faster
>>>> (in some cases at least) to just pull all revisions from each
>>>> subrepo.
>>>
>>>
>>> OK, I misread the code- I thought each subrepo was getting a pull at
>>> each revision, which I figured would be slow.  I attached a test patch
>>> below- there's nothing special about it, but it helped me with my pull
>>> and outgoing changes (some comments probably still reflect this), so I
>>> changed that to pull and incoming to test your patch.
>>>
>>> - I think I see a double pull of a subrepo (search for "hg pull -S -r
>>> 3").
>>
>>
>> You are right. There is a problem with the current version of the
>> patch. It will clone the same repo multiple times if the same subrepo
>> refers to different revisions. I will resend with a fix.
>
>>
>>
>> BTW, where do you think the tests for this new feature should go?
>> Should it go into one of the test-pull*.t files or in one of the
>> test-subrepo*.t files? Do you think the test that you propose in that
>> patch would be enough to test this new feature? Maybe it could be made
>> a bit simpler?
>>
>
> I assumed it would go in test-subrepo*t, so anyone hacking on subrepos
> doesn't have to find it in an obscure test, but I can see the same argument
> from someone working on pull().  Probably what tips it to subrepo*t though
> is that there is some existing subrepo infrastructure that doesn't exist in
> test-pull*t.

Yes, I think you are right. It should probably go into test-subrepo-recursion.t.

> I'm not sure that it's comprehensive, and there's probably some duplication.
> I put it together to e.g. test 'push -r 2' then 'push -r 4', and make sure
> the grandchild repo code path got exercised.  Since you aren't doing
> anything with -r, there may be duplicate pull tests.  The incoming tests
> probably aren't relevant either, though I think they should agree on what
> they are doing eventually.
>
> Those tests also didn't exercise the subrepo pull path (the dests were
> empty, so clone was used).  I marked various potential issues in there with
> 'XXX'.
> One thing I did specifically was to skew the rev values between the parent
> and child, so parent rev 1 doesn't lock in subrepo rev 1.  This makes it
> easier to test with the -r option (which I realize you aren't doing now, but
> might be a nice to have in the future).

I think I will take some inspiration from your test and write a new
one specific to my proposed patch.

>>> - I wonder if the code in this patch can be leveraged to make incoming
>>> print all of the stuff 'pull -S' will grab in the future.
>>
>>
>> In theory that would be certainly possible. At least we could run the
>> same "look for subrepos to pull" code and show the list of subrepos
>> that would be pulled and then run incoming on them. The main problem
>> is that incoming already has a --subrepos flag, which does something
>> slightly different... I'm not sure that could be changed...
>
>
> I think it just recurses into the subrepo and does an incoming on it.  If it
> does something else, it's too subtle for me to have noticed.
>
> I've probably stated bits and pieces of this over the course of a week or
> so, but just to explicitly put it all together, I think in an ideal world we
> would have new pull/incoming, and push/outgoing agree with the same options,
> and each group is the reverse of the other.  It may be fantasy, but:
>
>    incoming       pull          operation
>    <no arg>       <no arg>      current behavior, parent repo only
>    xxx [1]        -u            current behavior, pull subrepo on update
>    -S             -S            (this patch) pull parent + all subrepos to
> tip
>    -S -r <rev>    -S -r <rev>   (future) pull parent to rev, revs of
> subrepos
>                                 used up to that rev

I think your -S -r <rev> idea is worth thinking about. This could mean
that if <rev> where revs that you already had in your parent repo you
would still pull from the corresponding subrepos, right?


> [1] It doesn't seem sensible to add -u to incoming for this, so leaving no
> equivalent form is probably best.
>
>    outgoing       push          operation
>    <no arg> [2]   <no arg>      current, parent only and all respectively
>    -S             <no arg> [3]  current, push parent + all subrepos to tip
>    -S -r <rev>    -r <rev>      (future) push parent to rev, revs of
> subrepos
>                                 used up to that rev

The problem with this is that if a subrepo changes more than once in a
<rev> range there is not cheap way to only send those revisions (and
their descendants) AFAIK. What I mean is that if subrepo S changes
from revision A to B and then from B to C on two different parent
revisions, you cannot just do push -r C on the subrepo, because B and
C may be on different branches and not have an ancestor-descendant
relationship. In that case you only have two choices. Either you push
twice ("push -r A" then "push -r C" or you push all).

> [2] It seems buggy to me that the no arg forms disagree, since outgoing's
> help says "These are the changesets that would be pushed if a push was
> requested." But that ship has probably already sailed.

Unfortunately I think that is true.

> [3] There is no form of push that takes -S

Since I have another patch series (waiting for review!) that makes
push only push "cleanstore" subrepos, it may make sense to add a "push
-S" flag to force pushing all subrepos regardless of the cleanstore
state.

> So the last two lines in each table align in functionality and (roughly) in
> arguments.
>
>
>
>>>>> I'll try to experiment with this some in the next few days.  I ran
>>>>> into issues with what I'm working on (push, outgoing) with deeply
>>>>> nested subrepos, and also when a parent locks in an earlier subrepo
>>>>> version.  I wonder if deeply nested subrepos will be a problem here
>>>>> since hgsubrepo.pull() doesn't walk its subrepos and pull them.
>>>>
>>>>
>>>> I must confess that I have not tried that too much. We should
>>>> definitely do this recursively. That being said I hope to get some
>>>> feedback on the current version that I sent to the list first.
>>>
>>>
>>> Sorry, I got crossed up on that too.  hgsubrepo.pull() ends up calling
>>> _repo.pull(), so it does recurse.
>>
>>
>> You are right. That's nice! :-)
>>
>>> The test below indicates that clone won't
>>> recurse- it reminds that an update is needed.  (Maybe clone needs a -S
>>> too
>>> as part of these changes?  If you aren't walking the history, I
>>> don't see a way around that because nothing is incoming after a clone,
>>> so you won't see subrepos of a cloned subrepo.)
>>
>>
>> I guess that would make sense in another patch.
>>
>>> The other thing worth a test is largefiles- the --all-largefiles
>>> option isn't passed to subrepos, so as it stands, 'pull -S' won't let
>>> you really disconnect, because largefiles in subrepos won't be cached.
>>> That
>>> can be fixed later.
>>
>>
>> I agree. We do not really use largefiles (yet) so we have not come
>> across this (yet) but I'm sure this is something we would like to
>> improve as well.
>>
>> One last thing, did you have time to have a look at the code of the
>> patch series itself? I plan to send an slighly changed version of the
>> patch series that will only change the first patch to fix the issue
>> you found (the fix is minimal so the code should be almost the same).
>> Do you have any comments?
>
>
> I did, but I'm not as well versed in this area of code- that's why I've kept
> the conversation fairly high level.  I was able to follow your commit
> message and it looks like the code does what it says.  Just curious- is
> specifying None as the revision the right thing to do in pull()?  AFAICT,
> None gives you the working directory... so does that change the results
> based on what the subrepo is currently updated to?

I use '.' on the latest version of the patch.

Anyway, I think I will resend the series with an added test. We can
think about the other possible improvements later. I really hope I can
get this in before the next release!

Cheers,

Angel

Patch

diff --git a/tests/test-issue2314.t b/tests/test-issue2314.t
new file mode 100644
--- /dev/null
+++ b/tests/test-issue2314.t
@@ -0,0 +1,174 @@ 
+  $ cat >> $HGRCPATH <<EOF
+  > [alias]
+  > slog = log --template '{rev}:{node|short} {desc|firstline}\n'
+  > sout = outgoing --template '{rev}:{node|short} {desc|firstline}\n'
+  > sin  = incoming --template '{rev}:{node|short} {desc|firstline}\n'
+  > EOF
+
+It appears that if the subrepo doesn't exist at the time of the clone,
+both out and push will abort.  I thought this worked...  Avoid the
+issue for now by initing an empty repo where the subrepo will go
+  $ hg init dest
+# XXX: pull aborts even though these are empty?!
+#  $ hg init dest/sub
+#  $ hg init dest/sub/subsub
+  $ hg clone -q dest src
+  $ hg init src/sub
+  $ cd src
+
+  $ echo '0' > sub/foo.txt
+  $ hg add -R sub sub/foo.txt
+  $ hg ci  -R sub -m "standalone subrepo commit"
+
+  $ echo '1' > sub/foo.txt
+  $ echo 'sub = sub' > .hgsub
+  $ hg add .hgsub
+  $ hg ci -S -m "lock in subrepo @ 1"
+  committing subrepository sub
+
+Add a grandchild to test all paths
+  $ hg init sub/subsub
+  $ echo 'subsub' > sub/subsub/bar.txt
+  $ hg add -R sub/subsub sub/subsub/bar.txt
+  $ echo 'subsub = subsub' > sub/.hgsub
+  $ hg add -R sub sub/.hgsub
+
+  $ echo '2' > sub/foo.txt
+  $ hg ci -S -m "lock in subrepo @ 2"
+  committing subrepository sub
+  committing subrepository sub\subsub
+
+  $ echo '3' > sub/foo.txt
+  $ hg ci -R sub -m "standalone subrepo commit @ 3"
+
+Add a grandchild to test all paths
+XXX: This subrepo won't get pushed, even with current code!!
+  $ hg init sub/phantom
+  $ echo 'phantom' > sub/phantom/bar.txt
+  $ hg add -R sub/phantom sub/phantom/bar.txt
+# XXX: this should be '>>' instead of '>'
+  $ echo 'phantom = phantom' > sub/.hgsub
+  $ echo '4' > sub/foo.txt
+  $ hg ci -S -m "lock in subrepo @ 4 (shouldn't be seen in subrepo 
listing with push -r 1)"
+  committing subrepository sub
+  committing subrepository sub\phantom
+  $ echo '5' > sub/foo.txt
+  $ hg ci -R sub -m "standalone subrepo commit @ 5"
+
+Change the subrepo back to a previous rev to make sure that if tip of
+parent is pushed, all of the required csets in the subrepo are also
+pushed regardless of what the parent tip has locked in.
+  $ hg up -R sub -r 2
+  3 files updated, 0 files merged, 0 files removed, 0 files unresolved
+  $ hg ci -S -m "lock in subrepo @ 2 (again)"
+#  $ echo '2.1' > sub/foo.txt
+#  $ hg ci -R sub -m "standalone subrepo commit @ 2.1"
+
+The tree built in src
+  $ hg slog
+  3:b6ef204e81fa lock in subrepo @ 2 (again)
+  2:0aa0d45d1538 lock in subrepo @ 4 (shouldn't be seen in subrepo 
listing with push -r 1)
+  1:000653a98f7b lock in subrepo @ 2
+  0:c4c54d98a77b lock in subrepo @ 1
+The tree in src/sub
+  $ hg slog -R sub
+  5:5b1b79eebbd2 standalone subrepo commit @ 5
+  4:cac37e36a35f lock in subrepo @ 4 (shouldn't be seen in subrepo 
listing with push -r 1)
+  3:4aef09d75093 standalone subrepo commit @ 3
+  2:f9f5c072ebeb lock in subrepo @ 2
+  1:30154f8598be lock in subrepo @ 1
+  0:5879cf081918 standalone subrepo commit
+
+
+Test the various forms of 'incoming'
+
+The current form where entire subrepo is pushed
+  $ hg sin -S --config paths.default=. -R ../dest
+  comparing with $TESTTMP\src
+  0:c4c54d98a77b lock in subrepo @ 1
+  1:000653a98f7b lock in subrepo @ 2
+  2:0aa0d45d1538 lock in subrepo @ 4 (shouldn't be seen in subrepo 
listing with push -r 1)
+  3:b6ef204e81fa lock in subrepo @ 2 (again)
+
+This shouldn't push all of the subrepo
+  $ hg sin -S -r 1 --config paths.default=. -R ../dest
+  comparing with $TESTTMP\src
+  0:c4c54d98a77b lock in subrepo @ 1
+  1:000653a98f7b lock in subrepo @ 2
+
+This should push all of the subrepo that has been locked in
+(i.e. NOT @ 5)
+  $ hg sin -S -r tip  --config paths.default=. -R ../dest
+  comparing with $TESTTMP\src
+  0:c4c54d98a77b lock in subrepo @ 1
+  1:000653a98f7b lock in subrepo @ 2
+  2:0aa0d45d1538 lock in subrepo @ 4 (shouldn't be seen in subrepo 
listing with push -r 1)
+  3:b6ef204e81fa lock in subrepo @ 2 (again)
+
+This should want to pull the subsub repo
+  $ hg pull -S -r 1  --config paths.default=. -R ../dest
+  pulling from $TESTTMP\src
+  adding changesets
+  adding manifests
+  adding file changes
+  added 2 changesets with 3 changes to 2 files
+  cloning subrepo sub from $TESTTMP/src/sub
+  pulling subrepo sub from $TESTTMP/src/sub
+  searching for changes
+  no changes found
+  (run 'hg update' to get a working copy)
+
+  $ hg slog -R ../dest
+  1:000653a98f7b lock in subrepo @ 2
+  0:c4c54d98a77b lock in subrepo @ 1
+  $ hg slog -R ../dest/sub
+  5:5b1b79eebbd2 standalone subrepo commit @ 5
+  4:cac37e36a35f lock in subrepo @ 4 (shouldn't be seen in subrepo 
listing with push -r 1)
+  3:4aef09d75093 standalone subrepo commit @ 3
+  2:f9f5c072ebeb lock in subrepo @ 2
+  1:30154f8598be lock in subrepo @ 1
+  0:5879cf081918 standalone subrepo commit
+
+This should want to pull subrepo rev 4, even though rev 2 is locked in
+here
+  $ hg sin -S -r 3  --config paths.default=. -R ../dest
+  comparing with $TESTTMP\src
+  searching for changes
+  2:0aa0d45d1538 lock in subrepo @ 4 (shouldn't be seen in subrepo 
listing with push -r 1)
+  3:b6ef204e81fa lock in subrepo @ 2 (again)
+
+..and this should pull subrepo rev 4
+  $ hg pull -S -r 3  --config paths.default=. -R ../dest
+  pulling from $TESTTMP\src
+  searching for changes
+  adding changesets
+  adding manifests
+  adding file changes
+  added 2 changesets with 2 changes to 1 files
+  pulling subrepo sub from $TESTTMP/src/sub
+  searching for changes
+  no changes found
+  pulling subrepo sub from $TESTTMP/src/sub
+  searching for changes
+  no changes found
+  (run 'hg update' to get a working copy)
+
+Pull all subrepo revs locked into the parent
+  $ hg pull -S -r tip  --config paths.default=. -R ../dest
+  pulling from $TESTTMP\src
+  no changes found
+
+  $ hg slog -R ../dest/sub
+  5:5b1b79eebbd2 standalone subrepo commit @ 5
+  4:cac37e36a35f lock in subrepo @ 4 (shouldn't be seen in subrepo 
listing with push -r 1)
+  3:4aef09d75093 standalone subrepo commit @ 3
+  2:f9f5c072ebeb lock in subrepo @ 2
+  1:30154f8598be lock in subrepo @ 1