Patchwork [1,of,3] pull: add --subrepos flag

login
register
mail settings
Submitter Angel Ezquerra
Date Feb. 17, 2013, 12:19 p.m.
Message ID <abbd26cca35280fb8f78.1361103556@Angel-PC.localdomain>
Download mbox | patch
Permalink /patch/1020/
State Superseded, archived
Headers show

Comments

Angel Ezquerra - Feb. 17, 2013, 12:19 p.m.
# HG changeset patch
# User Angel Ezquerra <angel.ezquerra@gmail.com>
# Date 1360519226 -3600
# Node ID abbd26cca35280fb8f784b3f2c02eef71696c47b
# Parent  55b9b294b7544a6a144f627f71f4b770907d5a98
pull: add --subrepos flag

The purpose of this new flag is to ensure that you are able to update to any
incoming revision without requiring any network access. The idea is to make sure
that the repository is self-contained after doing hg pull --subrepos, as long as
it already was self-contained before the pull).

When the --subrepos flag is enabled, pull will also pull (or clone) all subrepos
that are present on the current revision and those that are referenced by any of
the incoming revisions.

If the incoming revisions refer to subrepos that are not on the working
directory yet they will be cloned. If any of the subrepositories changes its
pull source (as defined on the .hgsub file) it will be pulled from the current
and the new source.

This first patch only supports mercurial subrepos (a NotImplementedError
exception will be raised for all other subrepo types). Future patches will add
support for other subrepo types.

This requires some tests (I will add them after review). I ran the whole test
suite and all non skipped tests passed (# Ran 426 tests, 29 skipped, 0 failed.).
Matt Harbison - Feb. 20, 2013, 5:57 a.m.
On Sun, 17 Feb 2013 13:19:16 +0100, Angel Ezquerra wrote:

> # HG changeset patch
> # User Angel Ezquerra <angel.ezquerra@gmail.com>
> # Date 1360519226 -3600
> # Node ID abbd26cca35280fb8f784b3f2c02eef71696c47b
> # Parent  55b9b294b7544a6a144f627f71f4b770907d5a98
> pull: add --subrepos flag
> 
> The purpose of this new flag is to ensure that you are able to update to any
> incoming revision without requiring any network access. The idea is to make sure
> that the repository is self-contained after doing hg pull --subrepos, as long as
> it already was self-contained before the pull).
> 
> When the --subrepos flag is enabled, pull will also pull (or clone) all subrepos
> that are present on the current revision and those that are referenced by any of
> the incoming revisions.

I haven't gotten a chance to really play with this yet, so I'm going more off the
comments here- I apologize if these answers should be obvious, but I'm not familiar
enough with some of the code.

 - Is there an easy way to tell if the repo is/was self contained?  (Maybe
   incoming -S?)

 - Is the 'self-contained' bit to limit overhead on each pull, or is there another
   reason this can't ensure the result is self contained?  'Push' and 'outgoing -S'
   recognize (almost) everything going in the other direction, so it might be nice
   to have the same capability with a form of pull.  (I may have found a push bug
   that I haven't gotten back to yet.)

 - The full subrepo gets pulled, even revs not committed to the parent?  I think
   that's a good thing, because regularly get burned when I 'pull -u' the tree to
   another machine and then go to apply the rest of a patch queue to the subrepo.

I'll try to experiment with this some in the next few days.  I ran into issues with
what I'm working on (push, outgoing) with deeply nested subrepos, and also when a
parent locks in an earlier subrepo version.  I wonder if deeply nested subrepos will
be a problem here since hgsubrepo.pull() doesn't walk its subrepos and pull them.

--Matt
Angel Ezquerra - Feb. 20, 2013, 11:19 p.m.
On Wed, Feb 20, 2013 at 6:57 AM, Matt Harbison <matt_harbison@yahoo.com> wrote:
> On Sun, 17 Feb 2013 13:19:16 +0100, Angel Ezquerra wrote:
>
>> # HG changeset patch
>> # User Angel Ezquerra <angel.ezquerra@gmail.com>
>> # Date 1360519226 -3600
>> # Node ID abbd26cca35280fb8f784b3f2c02eef71696c47b
>> # Parent  55b9b294b7544a6a144f627f71f4b770907d5a98
>> pull: add --subrepos flag
>>
>> The purpose of this new flag is to ensure that you are able to update to any
>> incoming revision without requiring any network access. The idea is to make sure
>> that the repository is self-contained after doing hg pull --subrepos, as long as
>> it already was self-contained before the pull).
>>
>> When the --subrepos flag is enabled, pull will also pull (or clone) all subrepos
>> that are present on the current revision and those that are referenced by any of
>> the incoming revisions.
>
> I haven't gotten a chance to really play with this yet, so I'm going more off the
> comments here- I apologize if these answers should be obvious, but I'm not familiar
> enough with some of the code.
>
>  - Is there an easy way to tell if the repo is/was self contained?  (Maybe
>    incoming -S?)

No there is not. I don't think incoming -S would do the trick since
that would just tell you if there are _new_ incoming revisions on some
of the _current_ subrepos. A repo is "self-contained" if it is
possible to update to any of its revisions withing requiring a pull of
one or more of its subrepos.

I don't know of any existing mercurial command that would be able to
give you that information.

>  - Is the 'self-contained' bit to limit overhead on each pull, or is there another
>    reason this can't ensure the result is self contained?  'Push' and 'outgoing -S'
>    recognize (almost) everything going in the other direction, so it might be nice
>    to have the same capability with a form of pull.  (I may have found a push bug
>    that I haven't gotten back to yet.)

I'm not sure I understand what you mean. I don't think you (we?) must
give too much importance to this "self-contained" concept. It is just
a way for me to explain the purpose of the patch, and specially to
explain why we must look for subrepos on all the new incoming
revisions, and why we cannot just limit ourselves to pulling the
subrepos on the current revisions (short answer: because new subrepos
may appear on the new, incoming revisions).

My patch explicitly says that hg pull -S will only make your subrepo
self-contained if it was already self-contained before. This is in
order to avoid having to look for subrepos on all the repo history,
rather than just looking for subrepos on the incoming revision (and
the current one).

>  - The full subrepo gets pulled, even revs not committed to the parent?  I think
>    that's a good thing, because regularly get burned when I 'pull -u' the tree to
>    another machine and then go to apply the rest of a patch queue to the subrepo.

Yes. It is perhaps not optimal but I think it is simpler. In addition
if different parent repo revisions point to different revisions on a
subrepo there is no way for us to tell which of those subrepo
revisions is the one that is closes to tip, or which ones are
ancestors of the other ones, etc. As a result we would need to perform
as many pulls on a given repo as the number of different revisions of
that subrepo that were referenced on the parent repo. That is complex
and slow, so it is much simpler and possibly faster (in some cases at
least) to just pull all revisions from each subrepo.

> I'll try to experiment with this some in the next few days.  I ran into issues with
> what I'm working on (push, outgoing) with deeply nested subrepos, and also when a
> parent locks in an earlier subrepo version.  I wonder if deeply nested subrepos will
> be a problem here since hgsubrepo.pull() doesn't walk its subrepos and pull them.

I must confess that I have not tried that too much. We should
definitely do this recursively. That being said I hope to get some
feedback on the current version that I sent to the list first.

Cheers,

Angel

Patch

diff --git a/mercurial/commands.py b/mercurial/commands.py
--- a/mercurial/commands.py
+++ b/mercurial/commands.py
@@ -18,6 +18,7 @@ 
 import dagparser, context, simplemerge, graphmod
 import random, setdiscovery, treediscovery, dagutil, pvec, localrepo
 import phases, obsolete
+import itertools
 
 table = {}
 
@@ -4694,6 +4695,7 @@ 
     ('B', 'bookmark', [], _("bookmark to pull"), _('BOOKMARK')),
     ('b', 'branch', [], _('a specific branch you would like to pull'),
      _('BRANCH')),
+    ('S', 'subrepos', None, _('pull current and incoming subrepos')),
     ] + remoteopts,
     _('[-u] [-f] [-r REV]... [-e CMD] [--remotecmd CMD] [SOURCE]'))
 def pull(ui, repo, source="default", **opts):
@@ -4738,7 +4740,25 @@ 
                     "so a rev cannot be specified.")
             raise util.Abort(err)
 
+    oldtip = len(repo)
     modheads = repo.pull(other, heads=revs, force=opts.get('force'))
+
+    # update current and new subrepos
+    if opts.get('subrepos'):
+        # pull (or clone) the subrepos that are referenced by the
+        # current revision or by any of the incoming revisions
+        substopull = {}
+        revstocheck = itertools.chain(['.'], repo.changelog.revs(oldtip))
+        for rev in revstocheck:
+            ctx = repo[rev]
+            for sname, sinfo in ctx.substate.items():
+                substopull[(sname, sinfo,)] = ctx
+
+        # note that we must pull the subrepos before calling postincoming
+        # to avoid pulling them again if --update is also set
+        for (sname, sinfo), ctx in substopull.items():
+            ctx.sub(sname).pull(sinfo[0])
+
     bookmarks.updatefromremote(ui, repo, other, source)
     if checkout:
         checkout = str(repo.changelog.rev(other.lookup(checkout)))
diff --git a/mercurial/subrepo.py b/mercurial/subrepo.py
--- a/mercurial/subrepo.py
+++ b/mercurial/subrepo.py
@@ -330,6 +330,11 @@ 
         """
         raise NotImplementedError
 
+    def pull(self, source):
+        """pull from the given parent repository source
+        """
+        raise NotImplementedError
+
     def get(self, state, overwrite=False):
         """run whatever commands are needed to put the subrepo into
         this state
@@ -528,6 +533,10 @@ 
         self._repo.ui.note(_('removing subrepo %s\n') % subrelpath(self))
         hg.clean(self._repo, node.nullid, False)
 
+    def pull(self, source):
+        state = (source, None, 'hg')
+        return self._get(state)
+
     def _get(self, state):
         source, revision, kind = state
         if revision not in self._repo:
diff --git a/tests/test-debugcomplete.t b/tests/test-debugcomplete.t
--- a/tests/test-debugcomplete.t
+++ b/tests/test-debugcomplete.t
@@ -204,7 +204,7 @@ 
   init: ssh, remotecmd, insecure
   log: follow, follow-first, date, copies, keyword, rev, removed, only-merges, user, only-branch, branch, prune, patch, git, limit, no-merges, stat, graph, style, template, include, exclude
   merge: force, rev, preview, tool
-  pull: update, force, rev, bookmark, branch, ssh, remotecmd, insecure
+  pull: update, force, rev, bookmark, branch, subrepos, ssh, remotecmd, insecure
   push: force, rev, bookmark, branch, new-branch, ssh, remotecmd, insecure
   remove: after, force, include, exclude
   serve: accesslog, daemon, daemon-pipefds, errorlog, port, address, prefix, name, web-conf, webdir-conf, pid-file, stdio, cmdserver, templates, style, ipv6, certificate