Patchwork [1,of,2,RFC] subrepo: create subrepo cache and create working dir subrepos as shared repos

login
register
mail settings
Submitter Angel Ezquerra
Date Nov. 15, 2013, 8:15 p.m.
Message ID <eabbdd067b8e2f4c3dc7.1384546521@Angel-PC.localdomain>
Download mbox | patch
Permalink /patch/2950/
State Deferred
Headers show

Comments

Angel Ezquerra - Nov. 15, 2013, 8:15 p.m.
# HG changeset patch
# User Angel Ezquerra <angel.ezquerra@gmail.com>
# Date 1383263921 -3600
#      Fri Nov 01 00:58:41 2013 +0100
# Node ID eabbdd067b8e2f4c3dc7ba9678b0acb54e1ed710
# Parent  c38c3fdc8b9317ba09e03ab09364c3800da7c50c
subrepo: create subrepo cache and create working dir subrepos as shared repos

With this change cloned subrepos are no longer directly cloned as regular
repositories on the working directory. Instead, they are are first cloned into a
subrepo "cache" (in the parent repository's ".hg/cache/subs" directory). Once
this is done, the "working directory subrepos" are created as "shared
repositories" from the actual repositories on the subs cache directory.

The main motivation for this patch is three-fold:

- To make it much safer to remove a subrepo from the working directory.
- To avoid the need to reclone a subrepo from the source repository when you
delete it from your working directory (which sometimes you must do before
running hg update if you converted a regular folder into a subrepo or
viceversa).
- To eventually be able to automatically and safely remove subrepos from the
working directory when updating to a revision that has deleted the subrepo.

Longer term it _might_ be even possible to efficiently implement "advanced"
subrepo operations such as moving or copying subrepos.

This change is fully functional but there are a few things left TODO:

- Change hgweb to be "subcache aware" (will be done on a later patch)
- Handle the addition of new local subrepos to the cache (e.g. an existing,
regular repository that is added as a subrepo to a parent repository).
- Handle the deletion of subrepositories (should be easier to handle thanks to
this patch).
- Perhaps we should go beyond what a regular "shared" repository does, and try
to create most files on the cached repository, including mq patches.
Alternatively we could check if any "unknown" file exists on the working
directory copy of the subrepo and ask the user what to do when a subrepo is
removed.

Other notes:

- This works fine even if the share extension is not enabled
- This change is backwards compatible.
- Currently for simplicity reasons the location of the cached subrepos is
.hg/cache/subs/SUB_PATH. It might be safer to use a hashed name to avoid
problems with long paths.
- I have removed a test that no longer made sense from test-subrepo-recursion.t
and slightly changed a test in test-subrepo.t.
Matt Mackall - Jan. 13, 2014, 8:38 p.m.
On Fri, 2013-11-15 at 21:15 +0100, Angel Ezquerra wrote:
> - This change is backwards compatible.

What if an old client tries to work with a repository checked out by a
newer hg?

What if a new client tries to work with a repository checked out by an
older hg?

I don't see any form of fallback handling, so I'm not sure how either of
these could work.
Matt Mackall - Jan. 13, 2014, 8:41 p.m.
On Mon, 2014-01-13 at 14:38 -0600, Matt Mackall wrote:
> On Fri, 2013-11-15 at 21:15 +0100, Angel Ezquerra wrote:
> > - This change is backwards compatible.
> 
> What if an old client tries to work with a repository checked out by a
> newer hg?
> 
> What if a new client tries to work with a repository checked out by an
> older hg?
> 
> I don't see any form of fallback handling, so I'm not sure how either of
> these could work.

Sorry, I see how this works now. Very clever.

This does, however, leave us with a nasty cache management problem.
Angel Ezquerra - Jan. 17, 2014, 8:40 p.m.
On Mon, Jan 13, 2014 at 12:41 PM, Matt Mackall <mpm@selenic.com> wrote:
> On Mon, 2014-01-13 at 14:38 -0600, Matt Mackall wrote:
>> On Fri, 2013-11-15 at 21:15 +0100, Angel Ezquerra wrote:
>> > - This change is backwards compatible.
>>
>> What if an old client tries to work with a repository checked out by a
>> newer hg?
>>
>> What if a new client tries to work with a repository checked out by an
>> older hg?
>>
>> I don't see any form of fallback handling, so I'm not sure how either of
>> these could work.
>
> Sorry, I see how this works now. Very clever.

Thanks :-)

> This does, however, leave us with a nasty cache management problem.

I'm not quite sure what exactly you mean by a nasty cache management
problem. Could you elaborate?

In any case I don't think these changes make things any worse than
they are now. We already have a subrepository cache, the working
directory itself! What this patch series does is move that existing
implicit cache into the .hg folder, which fixes (or makes it possible
to fix) a number of the well known issues with the current subrepo
behavior:

1. You cannot remove subrepos from your working directory unless you
are willing to reclone them if you update to a revision that needs
them. Even if you are willing to do that then you cannot serve your
repo to other users due to the missing subrepos.

2. Problem #1 makes it very tricky to convert an existing regular
folder into a subrepo or viceversa. If you do there are problems when
you try to update back and forth between the corresponding parent repo
revisions.

3. Moving or copying a subrepo requires having two full copies of the
subrepo. With a proper cache we could potentially avoid that. Also
both copies must be on the working directory, including one at the
original location of the subrepo that you moved, which is very ugly.

4. When you use subrepos the .hg folder does not contain the whole
history of your repository (this feels to me as a deviation from the
basic mercurial design).

So I think this is a net win...

Angel

Patch

# HG changeset patch
# User Angel Ezquerra <angel.ezquerra@gmail.com>
# Date 1383263921 -3600
#      Fri Nov 01 00:58:41 2013 +0100
# Node ID eabbdd067b8e2f4c3dc7ba9678b0acb54e1ed710
# Parent  c38c3fdc8b9317ba09e03ab09364c3800da7c50c
subrepo: create subrepo cache and create working dir subrepos as shared repos

With this change cloned subrepos are no longer directly cloned as regular
repositories on the working directory. Instead, they are are first cloned into a
subrepo "cache" (in the parent repository's ".hg/cache/subs" directory). Once
this is done, the "working directory subrepos" are created as "shared
repositories" from the actual repositories on the subs cache directory.

The main motivation for this patch is three-fold:

- To make it much safer to remove a subrepo from the working directory.
- To avoid the need to reclone a subrepo from the source repository when you
delete it from your working directory (which sometimes you must do before
running hg update if you converted a regular folder into a subrepo or
viceversa).
- To eventually be able to automatically and safely remove subrepos from the
working directory when updating to a revision that has deleted the subrepo.

Longer term it _might_ be even possible to efficiently implement "advanced"
subrepo operations such as moving or copying subrepos.

This change is fully functional but there are a few things left TODO:

- Change hgweb to be "subcache aware" (will be done on a later patch)
- Handle the addition of new local subrepos to the cache (e.g. an existing,
regular repository that is added as a subrepo to a parent repository).
- Handle the deletion of subrepositories (should be easier to handle thanks to
this patch).
- Perhaps we should go beyond what a regular "shared" repository does, and try
to create most files on the cached repository, including mq patches.
Alternatively we could check if any "unknown" file exists on the working
directory copy of the subrepo and ask the user what to do when a subrepo is
removed.

Other notes:

- This works fine even if the share extension is not enabled
- This change is backwards compatible.
- Currently for simplicity reasons the location of the cached subrepos is
.hg/cache/subs/SUB_PATH. It might be safer to use a hashed name to avoid
problems with long paths.
- I have removed a test that no longer made sense from test-subrepo-recursion.t
and slightly changed a test in test-subrepo.t.

diff --git a/mercurial/subrepo.py b/mercurial/subrepo.py
--- a/mercurial/subrepo.py
+++ b/mercurial/subrepo.py
@@ -463,12 +463,19 @@ 
         self._path = path
         self._state = state
         r = ctx._repo
+        cachedroot = os.path.join(r.join('cache'), 'subs', path)
+        create = False
+        if not os.path.exists(os.path.join(cachedroot, '.hg')):
+            util.makedirs(cachedroot)
+            create = True
+        self._cachedrepo = hg.repository(r.baseui, cachedroot, create=create)
         root = r.wjoin(path)
         create = False
         if not os.path.exists(os.path.join(root, '.hg')):
             create = True
             util.makedirs(root)
-        self._repo = hg.repository(r.baseui, root, create=create)
+            hg.share(r.baseui, cachedroot, root, False)
+        self._repo = hg.repository(r.baseui, root, create=False)
         for s, k in [('ui', 'commitsubrepos')]:
             v = r.ui.config(s, k)
             if v:
@@ -507,12 +514,12 @@ 
         filelist = ('bookmarks', 'store/phaseroots', 'store/00changelog.i')
         yield '# %s\n' % _expandedabspath(remotepath)
         for relname in filelist:
-            absname = os.path.normpath(self._repo.join(relname))
+            absname = os.path.normpath(self._cachedrepo.join(relname))
             yield '%s = %s\n' % (relname, _calcfilehash(absname))
 
     def _getstorehashcachepath(self, remotepath):
         '''get a unique path for the store hash cache'''
-        return self._repo.join(os.path.join(
+        return self._cachedrepo.join(os.path.join(
             'cache', 'storehash', _getstorehashcachename(remotepath)))
 
     def _readstorehashcache(self, remotepath):
@@ -655,11 +662,12 @@ 
                 self._repo.ui.status(_('cloning subrepo %s from %s\n')
                                      % (subrelpath(self), srcurl))
                 parentrepo = self._repo._subparent
-                shutil.rmtree(self._repo.path)
+                shutil.rmtree(self._cachedrepo.path)
                 other, cloned = hg.clone(self._repo._subparent.baseui, {},
-                                         other, self._repo.root,
+                                         other, self._cachedrepo.root,
                                          update=False)
-                self._repo = cloned.local()
+                self._repo = hg.repository(self._repo._subparent.baseui,
+                                           self._repo.root, create=False)
                 self._initrepo(parentrepo, source, create=True)
                 self._cachestorehash(srcurl)
             else:
diff --git a/tests/test-subrepo-recursion.t b/tests/test-subrepo-recursion.t
--- a/tests/test-subrepo-recursion.t
+++ b/tests/test-subrepo-recursion.t
@@ -377,18 +377,6 @@ 
 
   $ mv $HGRCPATH.no-progress $HGRCPATH
 
-Test archiving when there is a directory in the way for a subrepo
-created by archive:
-
-  $ hg clone -U . ../almost-empty
-  $ cd ../almost-empty
-  $ mkdir foo
-  $ echo f > foo/f
-  $ hg archive --subrepos -r tip archive
-  cloning subrepo foo from $TESTTMP/empty/foo
-  abort: destination '$TESTTMP/almost-empty/foo' is not empty (in subrepo foo) (glob)
-  [255]
-
 Clone and test outgoing:
 
   $ cd ..
diff --git a/tests/test-subrepo.t b/tests/test-subrepo.t
--- a/tests/test-subrepo.t
+++ b/tests/test-subrepo.t
@@ -856,7 +856,7 @@ 
   updating working directory
   cloning subrepo subrepo-2 from $TESTTMP/subrepo-status/subrepo-2
   2 files updated, 0 files merged, 0 files removed, 0 files unresolved
-  $ test -f ../shared/subrepo-1/.hg/sharedpath
+  $ test -f ../shared/.hg/cache/subs/subrepo-1/sharedpath
   [1]
   $ hg -R ../shared in
   abort: repository default not found!