Patchwork [4,of,6,V2] subrepo: implement "storeclean" method for mercurial subrepos

login
register
mail settings
Submitter Angel Ezquerra
Date March 9, 2013, 12:12 a.m.
Message ID <c363610c5a98db816710.1362787957@Angel-PC.localdomain>
Download mbox | patch
Permalink /patch/1090/
State Accepted
Commit aab0c14c20d0ed64ea08e490f628d0d0fef84436
Headers show

Comments

Angel Ezquerra - March 9, 2013, 12:12 a.m.
# HG changeset patch
# User Angel Ezquerra <angel.ezquerra@gmail.com>
# Date 1360973933 -3600
#      Sat Feb 16 01:18:53 2013 +0100
# Node ID c363610c5a98db8167102cde019fdca4966e7c02
# Parent  68ebd12161c9fb787c378630307840ec95a16cf1
subrepo: implement "storeclean" method for mercurial subrepos

The mercurial subrepo "storeclean" method works by calculating a "store hash" of
the repository state and comparing it to a cached store hash. The store hash is
always cached when the repository is cloned from or pushed to a remote
repository, but also on pull as long as the repository already had a clean
store. If the hashes match the store is "clean" versus the selected repository.

Note that this method is currenty unused, but it will be used by a later patch.

The store hash is calculated by hashing several key repository files, such as
the bookmarks file the phaseroots file and the changelog. Note that the hash
comparison is done file by file so that we can exit early if a pair of hashes
do not match. Also the hashes are calculated starting with the file that is
most likely to be smaller upto the file that is more likely to be larger.
Augie Fackler - March 12, 2013, 12:31 a.m.
On Mar 8, 2013, at 6:12 PM, Angel Ezquerra <angel.ezquerra@gmail.com> wrote:

> # HG changeset patch
> # User Angel Ezquerra <angel.ezquerra@gmail.com>
> # Date 1360973933 -3600
> #      Sat Feb 16 01:18:53 2013 +0100
> # Node ID c363610c5a98db8167102cde019fdca4966e7c02
> # Parent  68ebd12161c9fb787c378630307840ec95a16cf1
> subrepo: implement "storeclean" method for mercurial subrepos

I'm not a subrepos expert, so I'd like someone else to eyeball this before it goes in. That said, LGTM.

For Git you only need to hash the state of the repository's refs, which would be doable as sha1(os.system("git show-ref"))[0].

For Subversion I *think* you can just check if the wc is dirty. If the wc is dirtied, we need to take action, and if not we're already set. Maybe that already happens at commit time?

AF

[0]: Don't actually use os.system()! Here for illustrative purposes only.
Angel Ezquerra - March 12, 2013, 5:55 p.m.
On Tue, Mar 12, 2013 at 1:31 AM, Augie Fackler <raf@durin42.com> wrote:
>
> On Mar 8, 2013, at 6:12 PM, Angel Ezquerra <angel.ezquerra@gmail.com> wrote:
>
>> # HG changeset patch
>> # User Angel Ezquerra <angel.ezquerra@gmail.com>
>> # Date 1360973933 -3600
>> #      Sat Feb 16 01:18:53 2013 +0100
>> # Node ID c363610c5a98db8167102cde019fdca4966e7c02
>> # Parent  68ebd12161c9fb787c378630307840ec95a16cf1
>> subrepo: implement "storeclean" method for mercurial subrepos
>
> I'm not a subrepos expert, so I'd like someone else to eyeball this before it goes in. That said, LGTM.
>
> For Git you only need to hash the state of the repository's refs, which would be doable as sha1(os.system("git show-ref"))[0].
>
> For Subversion I *think* you can just check if the wc is dirty. If the wc is dirtied, we need to take action, and if not we're already set. Maybe that already happens at commit time?
>
> AF
>
> [0]: Don't actually use os.system()! Here for illustrative purposes only.

Thank you Augie,

would you be ok with the git and subrepo support being added on a
separate patch series, once this series has made it in?

As for subversion, I'm not an expert, but I guess you are right. Maybe
we should also check if the current working copy points to the
revision number that is pointed to by the parent repository? This
would be just in case someone used a svn command to checkout a
different revision... Maybe a subversion expert could give us some
advice (maybe Dan?)...

Cheers,

Angel
Augie Fackler - March 12, 2013, 6:52 p.m.
On Mar 12, 2013, at 10:55 AM, Angel Ezquerra <angel.ezquerra@gmail.com> wrote:

> On Tue, Mar 12, 2013 at 1:31 AM, Augie Fackler <raf@durin42.com> wrote:
>> 
>> On Mar 8, 2013, at 6:12 PM, Angel Ezquerra <angel.ezquerra@gmail.com> wrote:
>> 
>>> # HG changeset patch
>>> # User Angel Ezquerra <angel.ezquerra@gmail.com>
>>> # Date 1360973933 -3600
>>> #      Sat Feb 16 01:18:53 2013 +0100
>>> # Node ID c363610c5a98db8167102cde019fdca4966e7c02
>>> # Parent  68ebd12161c9fb787c378630307840ec95a16cf1
>>> subrepo: implement "storeclean" method for mercurial subrepos
>> 
>> I'm not a subrepos expert, so I'd like someone else to eyeball this before it goes in. That said, LGTM.
>> 
>> For Git you only need to hash the state of the repository's refs, which would be doable as sha1(os.system("git show-ref"))[0].
>> 
>> For Subversion I *think* you can just check if the wc is dirty. If the wc is dirtied, we need to take action, and if not we're already set. Maybe that already happens at commit time?
>> 
>> AF
>> 
>> [0]: Don't actually use os.system()! Here for illustrative purposes only.
> 
> Thank you Augie,
> 
> would you be ok with the git and subrepo support being added on a
> separate patch series, once this series has made it in?

I think this "subrepo" was supposed to be "subversion"?

Yeah, that's fine.

> As for subversion, I'm not an expert, but I guess you are right. Maybe
> we should also check if the current working copy points to the
> revision number that is pointed to by the parent repository? This
> would be just in case someone used a svn command to checkout a
> different revision... Maybe a subversion expert could give us some
> advice (maybe Dan?)...

I'm 95% sure clean-wc is right. Added some people that have done some Subversion pondering with me in the past for consultation.

> 
> Cheers,
> 
> Angel
Angel Ezquerra - March 12, 2013, 7 p.m.
On Tue, Mar 12, 2013 at 7:52 PM, Augie Fackler <raf@durin42.com> wrote:
>
> On Mar 12, 2013, at 10:55 AM, Angel Ezquerra <angel.ezquerra@gmail.com> wrote:
>
>> On Tue, Mar 12, 2013 at 1:31 AM, Augie Fackler <raf@durin42.com> wrote:
>>>
>>> On Mar 8, 2013, at 6:12 PM, Angel Ezquerra <angel.ezquerra@gmail.com> wrote:
>>>
>>>> # HG changeset patch
>>>> # User Angel Ezquerra <angel.ezquerra@gmail.com>
>>>> # Date 1360973933 -3600
>>>> #      Sat Feb 16 01:18:53 2013 +0100
>>>> # Node ID c363610c5a98db8167102cde019fdca4966e7c02
>>>> # Parent  68ebd12161c9fb787c378630307840ec95a16cf1
>>>> subrepo: implement "storeclean" method for mercurial subrepos
>>>
>>> I'm not a subrepos expert, so I'd like someone else to eyeball this before it goes in. That said, LGTM.
>>>
>>> For Git you only need to hash the state of the repository's refs, which would be doable as sha1(os.system("git show-ref"))[0].
>>>
>>> For Subversion I *think* you can just check if the wc is dirty. If the wc is dirtied, we need to take action, and if not we're already set. Maybe that already happens at commit time?
>>>
>>> AF
>>>
>>> [0]: Don't actually use os.system()! Here for illustrative purposes only.
>>
>> Thank you Augie,
>>
>> would you be ok with the git and subrepo support being added on a
>> separate patch series, once this series has made it in?
>
> I think this "subrepo" was supposed to be "subversion"?

Yes, that is what I meant. Sorry!

> Yeah, that's fine.
>
>> As for subversion, I'm not an expert, but I guess you are right. Maybe
>> we should also check if the current working copy points to the
>> revision number that is pointed to by the parent repository? This
>> would be just in case someone used a svn command to checkout a
>> different revision... Maybe a subversion expert could give us some
>> advice (maybe Dan?)...
>
> I'm 95% sure clean-wc is right. Added some people that have done some Subversion pondering with me in the past for consultation.
>

Awesome, thanks!

Angel

Patch

diff --git a/mercurial/subrepo.py b/mercurial/subrepo.py
--- a/mercurial/subrepo.py
+++ b/mercurial/subrepo.py
@@ -446,6 +446,72 @@ 
         self._repo.ui.setconfig('ui', '_usedassubrepo', 'True')
         self._initrepo(r, state[0], create)
 
+    def storeclean(self, path):
+        clean = True
+        lock = self._repo.lock()
+        itercache = self._calcstorehash(path)
+        try:
+            for filehash in self._readstorehashcache(path):
+                if filehash != itercache.next():
+                    clean = False
+                    break
+        except StopIteration:
+            # the cached and current pull states have a different size
+            clean = False
+        if clean:
+            try:
+                itercache.next()
+                # the cached and current pull states have a different size
+                clean = False
+            except StopIteration:
+                pass
+        lock.release()
+        return clean
+
+    def _calcstorehash(self, remotepath):
+        '''calculate a unique "store hash"
+
+        This method is used to to detect when there are changes that may
+        require a push to a given remote path.'''
+        # sort the files that will be hashed in increasing (likely) file size
+        filelist = ('bookmarks', 'store/phaseroots', 'store/00changelog.i')
+        yield '# %s\n' % remotepath
+        for relname in filelist:
+            absname = os.path.normpath(self._repo.join(relname))
+            yield '%s = %s\n' % (absname, _calcfilehash(absname))
+
+    def _getstorehashcachepath(self, remotepath):
+        '''get a unique path for the store hash cache'''
+        return self._repo.join(os.path.join(
+            'cache', 'storehash', _getstorehashcachename(remotepath)))
+
+    def _readstorehashcache(self, remotepath):
+        '''read the store hash cache for a given remote repository'''
+        cachefile = self._getstorehashcachepath(remotepath)
+        if not os.path.exists(cachefile):
+            return ''
+        fd = open(cachefile, 'r')
+        pullstate = fd.readlines()
+        fd.close()
+        return pullstate
+
+    def _cachestorehash(self, remotepath):
+        '''cache the current store hash
+
+        Each remote repo requires its own store hash cache, because a subrepo
+        store may be "clean" versus a given remote repo, but not versus another
+        '''
+        cachefile = self._getstorehashcachepath(remotepath)
+        lock = self._repo.lock()
+        storehash = list(self._calcstorehash(remotepath))
+        cachedir = os.path.dirname(cachefile)
+        if not os.path.exists(cachedir):
+            util.makedirs(cachedir, notindexed=True)
+        fd = open(cachefile, 'w')
+        fd.writelines(storehash)
+        fd.close()
+        lock.release()
+
     @annotatesubrepoerror
     def _initrepo(self, parentrepo, source, create):
         self._repo._subparent = parentrepo
@@ -564,12 +630,17 @@ 
                                          update=False)
                 self._repo = cloned.local()
                 self._initrepo(parentrepo, source, create=True)
+                self._cachestorehash(srcurl)
             else:
                 self._repo.ui.status(_('pulling subrepo %s from %s\n')
                                      % (subrelpath(self), srcurl))
+                cleansub = self.storeclean(srcurl)
                 self._repo.pull(other)
                 bookmarks.updatefromremote(self._repo.ui, self._repo, other,
                                            srcurl)
+                if cleansub:
+                    # keep the repo clean after pull
+                    self._cachestorehash(srcurl)
 
     @annotatesubrepoerror
     def get(self, state, overwrite=False):
@@ -622,7 +693,11 @@ 
         self._repo.ui.status(_('pushing subrepo %s to %s\n') %
             (subrelpath(self), dsturl))
         other = hg.peer(self._repo, {'ssh': ssh}, dsturl)
-        return self._repo.push(other, force, newbranch=newbranch)
+        res = self._repo.push(other, force, newbranch=newbranch)
+
+        # the repo is now clean
+        self._cachestorehash(dsturl)
+        return res
 
     @annotatesubrepoerror
     def outgoing(self, ui, dest, opts):