Patchwork [2,of,3,V3] journal: add share extension support

login
register
mail settings
Submitter Martijn Pieters
Date July 7, 2016, 5 p.m.
Message ID <e125ff50245cb519307d.1467910806@mjpieters-mbp.dhcp.thefacebook.com>
Download mbox | patch
Permalink /patch/15769/
State Changes Requested
Delegated to: Yuya Nishihara
Headers show

Comments

Martijn Pieters - July 7, 2016, 5 p.m.
# HG changeset patch
# User Martijn Pieters <mjpieters@fb.com>
# Date 1467714538 -3600
#      Tue Jul 05 11:28:58 2016 +0100
# Node ID e125ff50245cb519307dc75d606c9c8061784488
# Parent  1f282c1e1df71e1a433df354000feed5c06ad625
journal: add share extension support

Rather than put everything into one journal file, split entries up in *shared*
and *local* entries. Working copy changes are local to a specific working copy,
so should remain local only. Other entries are shared with the source if so
configured when the share was created.

When unsharing, any shared journale entries are copied across.
Yuya Nishihara - July 8, 2016, 3:24 p.m.
On Thu, 07 Jul 2016 18:00:06 +0100, Martijn Pieters wrote:
> # HG changeset patch
> # User Martijn Pieters <mjpieters@fb.com>
> # Date 1467714538 -3600
> #      Tue Jul 05 11:28:58 2016 +0100
> # Node ID e125ff50245cb519307dc75d606c9c8061784488
> # Parent  1f282c1e1df71e1a433df354000feed5c06ad625
> journal: add share extension support
>
> Rather than put everything into one journal file, split entries up in *shared*
> and *local* entries. Working copy changes are local to a specific working copy,
> so should remain local only. Other entries are shared with the source if so
> configured when the share was created.

So operations are recorded to the journal storage where they should belong,
which makes me feel the new shared feature, "journal", is unnecessary. We'd
always want to record bookmark changes to the source repo if bookmarks are
shared.

> +def _mergeentriesiter(*iterables, **kwargs):
> +    """Given a set of sorted iterables, yield the next entry in merged order
> +
> +    Note that by default entries go from most recent to oldest.
> +    """
> +    order = kwargs.pop('order', max)
> +    iterables = [iter(it) for it in iterables]
> +    # this tracks still active iterables; iterables are deleted as they are
> +    # exhausted, which is why this is a dictionary and why each entry also
> +    # stores the key. Entries are mutable so we can store the next value each
> +    # time.
> +    iterable_map = {}
> +    for key, it in enumerate(iterables):
> +        try:
> +            iterable_map[key] = [next(it), key, it]
> +        except StopIteration:
> +            # empty entry, can be ignored
> +            pass
> +    if not iterable_map:
> +        # all iterables where empty
> +        return
> +
> +    while True:

Nit: could be "while iterable_map:"

> +        value, key, it = order(iterable_map.itervalues())
> +        yield value
> +        try:
> +            iterable_map[key][0] = next(it)
> +        except StopIteration:
> +            # this iterable is empty, remove it from consideration
> +            del iterable_map[key]
> +            if not iterable_map:
> +                # all iterables are empty
> +                return

> +def unsharejournal(orig, ui, repo, repopath):
> +    """Copy shared journal entries into this repo when unsharing"""
> +    if repo.shared() and util.safehasattr(repo, 'journal'):
> +        sharedrepo = share._getsrcrepo(repo)
> +        sharedfeatures = _readsharedfeatures(repo)
> +        if sharedrepo and sharedfeatures > set(['journal']):
> +            # there is a shared repository and there are shared journal entries
> +            # to copy. move shared date over from source to destination but
> +            # move the local file first
> +            if repo.vfs.exists('journal'):
> +                journalpath = repo.join('journal')
> +                util.rename(journalpath, journalpath + '.bak')
> +            storage = repo.journal
> +            local = storage._open(
> +                repo.vfs, filename='journal.bak', _newestfirst=False)
> +            shared = (
> +                e for e in storage._open(sharedrepo.vfs, _newestfirst=False)
> +                if sharednamespaces.get(e.namespace) in sharedfeatures)
> +            for entry in _mergeentriesiter(local, shared, order=min):
> +                storage._write(repo.vfs, entry)

"hg clone" of shared repo will add wrong record to the source repo.

  $ hg share foo bar
  $ hg clone bar baz  # repo=bar, repopath=baz

> +        # is this working copy using a shared storage?
> +        self.sharedfeatures = self.svfs = None
> +        if repo.shared():
> +            features = _readsharedfeatures(repo)
> +            sharedrepo = share._getsrcrepo(repo)
> +            if sharedrepo is not None and 'journal' in features:
> +                self.svfs = sharedrepo.vfs
> +                self.sharedfeatures = features

Nit: "svfs" seems confusing since repo.svfs is store, but journal.svfs is
non-store of shared repo.
Martijn Pieters - July 11, 2016, 1:07 p.m.
On 8 July 2016 at 16:24, Yuya Nishihara <yuya@tcha.org> wrote:
>> +def _mergeentriesiter(*iterables, **kwargs):
>> +    """Given a set of sorted iterables, yield the next entry in merged order
>> +
>> +    Note that by default entries go from most recent to oldest.
>> +    """
>> +    order = kwargs.pop('order', max)
>> +    iterables = [iter(it) for it in iterables]
>> +    # this tracks still active iterables; iterables are deleted as they are
>> +    # exhausted, which is why this is a dictionary and why each entry also
>> +    # stores the key. Entries are mutable so we can store the next value each
>> +    # time.
>> +    iterable_map = {}
>> +    for key, it in enumerate(iterables):
>> +        try:
>> +            iterable_map[key] = [next(it), key, it]
>> +        except StopIteration:
>> +            # empty entry, can be ignored
>> +            pass
>> +    if not iterable_map:
>> +        # all iterables where empty
>> +        return
>> +
>> +    while True:
>
> Nit: could be "while iterable_map:"

Nice one, that also lets me drop the `if not iterable_map: return` before it.

>> +def unsharejournal(orig, ui, repo, repopath):
>> +    """Copy shared journal entries into this repo when unsharing"""
>> +    if repo.shared() and util.safehasattr(repo, 'journal'):
>> +        sharedrepo = share._getsrcrepo(repo)
>> +        sharedfeatures = _readsharedfeatures(repo)
>> +        if sharedrepo and sharedfeatures > set(['journal']):
>> +            # there is a shared repository and there are shared journal entries
>> +            # to copy. move shared date over from source to destination but
>> +            # move the local file first
>> +            if repo.vfs.exists('journal'):
>> +                journalpath = repo.join('journal')
>> +                util.rename(journalpath, journalpath + '.bak')
>> +            storage = repo.journal
>> +            local = storage._open(
>> +                repo.vfs, filename='journal.bak', _newestfirst=False)
>> +            shared = (
>> +                e for e in storage._open(sharedrepo.vfs, _newestfirst=False)
>> +                if sharednamespaces.get(e.namespace) in sharedfeatures)
>> +            for entry in _mergeentriesiter(local, shared, order=min):
>> +                storage._write(repo.vfs, entry)
>
> "hg clone" of shared repo will add wrong record to the source repo.
>
>   $ hg share foo bar
>   $ hg clone bar baz  # repo=bar, repopath=baz

Ah, of course. I've added a test to see if `repo.path` is the same as
the `repopath` argument.

>> +        # is this working copy using a shared storage?
>> +        self.sharedfeatures = self.svfs = None
>> +        if repo.shared():
>> +            features = _readsharedfeatures(repo)
>> +            sharedrepo = share._getsrcrepo(repo)
>> +            if sharedrepo is not None and 'journal' in features:
>> +                self.svfs = sharedrepo.vfs
>> +                self.sharedfeatures = features
>
> Nit: "svfs" seems confusing since repo.svfs is store, but journal.svfs is
> non-store of shared repo.

Renamed to sharedvfs instead.
Yuya Nishihara - July 11, 2016, 1:26 p.m.
On Sat, 9 Jul 2016 00:24:13 +0900, Yuya Nishihara wrote:
> On Thu, 07 Jul 2016 18:00:06 +0100, Martijn Pieters wrote:
> > # HG changeset patch
> > # User Martijn Pieters <mjpieters@fb.com>
> > # Date 1467714538 -3600
> > #      Tue Jul 05 11:28:58 2016 +0100
> > # Node ID e125ff50245cb519307dc75d606c9c8061784488
> > # Parent  1f282c1e1df71e1a433df354000feed5c06ad625
> > journal: add share extension support
> >
> > Rather than put everything into one journal file, split entries up in *shared*
> > and *local* entries. Working copy changes are local to a specific working copy,
> > so should remain local only. Other entries are shared with the source if so
> > configured when the share was created.  
> 
> So operations are recorded to the journal storage where they should belong,
> which makes me feel the new shared feature, "journal", is unnecessary. We'd
> always want to record bookmark changes to the source repo if bookmarks are
> shared.

Any thought about this?

In short, I don't think it's necessary to write "journal\n" to .hg/shared.
Martijn Pieters - July 11, 2016, 1:41 p.m.
On 11 July 2016 at 14:26, Yuya Nishihara <yuya@tcha.org> wrote:
>> So operations are recorded to the journal storage where they should belong,
>> which makes me feel the new shared feature, "journal", is unnecessary. We'd
>> always want to record bookmark changes to the source repo if bookmarks are
>> shared.
>
> Any thought about this?
>
> In short, I don't think it's necessary to write "journal\n" to .hg/shared.


My apologies, I meant to respond to that too but forgot.

In theory you could have the journal extension disabled when you first
shared; the 'journal\n' entry in .hg/shared shows that it was enabled
when sharing started. This allows for the journal to be enabled only
for one of the shared working copies.
Yuya Nishihara - July 11, 2016, 1:58 p.m.
On Mon, 11 Jul 2016 14:41:24 +0100, Martijn Pieters wrote:
> In theory you could have the journal extension disabled when you first
> shared; the 'journal\n' entry in .hg/shared shows that it was enabled
> when sharing started. This allows for the journal to be enabled only
> for one of the shared working copies.

Got it, thanks.

Patch

diff --git a/hgext/journal.py b/hgext/journal.py
--- a/hgext/journal.py
+++ b/hgext/journal.py
@@ -14,6 +14,7 @@ 
 from __future__ import absolute_import
 
 import collections
+import errno
 import os
 import weakref
 
@@ -27,12 +28,15 @@ 
     dispatch,
     error,
     extensions,
+    hg,
     localrepo,
     lock,
     node,
     util,
 )
 
+from . import share
+
 cmdtable = {}
 command = cmdutil.command(cmdtable)
 
@@ -48,6 +52,11 @@ 
 # namespaces
 bookmarktype = 'bookmark'
 wdirparenttype = 'wdirparent'
+# In a shared repository, what shared feature name is used
+# to indicate this namespace is shared with the source?
+sharednamespaces = {
+    bookmarktype: hg.sharedbookmarks,
+}
 
 # Journal recording, register hooks and storage object
 def extsetup(ui):
@@ -57,6 +66,8 @@ 
         dirstate.dirstate, '_writedirstate', recorddirstateparents)
     extensions.wrapfunction(
         localrepo.localrepository.dirstate, 'func', wrapdirstate)
+    extensions.wrapfunction(hg, 'postshare', wrappostshare)
+    extensions.wrapfunction(hg, 'copystore', unsharejournal)
 
 def reposetup(ui, repo):
     if repo.local():
@@ -114,6 +125,79 @@ 
                 repo.journal.record(bookmarktype, mark, oldvalue, value)
     return orig(store, fp)
 
+# shared repository support
+def _readsharedfeatures(repo):
+    """A set of shared features for this repository"""
+    try:
+        return set(repo.vfs.read('shared').splitlines())
+    except IOError as inst:
+        if inst.errno != errno.ENOENT:
+            raise
+        return set()
+
+def _mergeentriesiter(*iterables, **kwargs):
+    """Given a set of sorted iterables, yield the next entry in merged order
+
+    Note that by default entries go from most recent to oldest.
+    """
+    order = kwargs.pop('order', max)
+    iterables = [iter(it) for it in iterables]
+    # this tracks still active iterables; iterables are deleted as they are
+    # exhausted, which is why this is a dictionary and why each entry also
+    # stores the key. Entries are mutable so we can store the next value each
+    # time.
+    iterable_map = {}
+    for key, it in enumerate(iterables):
+        try:
+            iterable_map[key] = [next(it), key, it]
+        except StopIteration:
+            # empty entry, can be ignored
+            pass
+    if not iterable_map:
+        # all iterables where empty
+        return
+
+    while True:
+        value, key, it = order(iterable_map.itervalues())
+        yield value
+        try:
+            iterable_map[key][0] = next(it)
+        except StopIteration:
+            # this iterable is empty, remove it from consideration
+            del iterable_map[key]
+            if not iterable_map:
+                # all iterables are empty
+                return
+
+def wrappostshare(orig, sourcerepo, destrepo, **kwargs):
+    """Mark this shared working copy as sharing journal information"""
+    orig(sourcerepo, destrepo, **kwargs)
+    with destrepo.vfs('shared', 'a') as fp:
+        fp.write('journal\n')
+
+def unsharejournal(orig, ui, repo, repopath):
+    """Copy shared journal entries into this repo when unsharing"""
+    if repo.shared() and util.safehasattr(repo, 'journal'):
+        sharedrepo = share._getsrcrepo(repo)
+        sharedfeatures = _readsharedfeatures(repo)
+        if sharedrepo and sharedfeatures > set(['journal']):
+            # there is a shared repository and there are shared journal entries
+            # to copy. move shared date over from source to destination but
+            # move the local file first
+            if repo.vfs.exists('journal'):
+                journalpath = repo.join('journal')
+                util.rename(journalpath, journalpath + '.bak')
+            storage = repo.journal
+            local = storage._open(
+                repo.vfs, filename='journal.bak', _newestfirst=False)
+            shared = (
+                e for e in storage._open(sharedrepo.vfs, _newestfirst=False)
+                if sharednamespaces.get(e.namespace) in sharedfeatures)
+            for entry in _mergeentriesiter(local, shared, order=min):
+                storage._write(repo.vfs, entry)
+
+    return orig(ui, repo, repopath)
+
 class journalentry(collections.namedtuple(
         'journalentry',
         'timestamp user command namespace name oldhashes newhashes')):
@@ -157,6 +241,10 @@ 
 class journalstorage(object):
     """Storage for journal entries
 
+    Entries are divided over two files; one with entries that pertain to the
+    local working copy *only*, and one with entries that are shared across
+    multiple working copies when shared using the share extension.
+
     Entries are stored with NUL bytes as separators. See the journalentry
     class for the per-entry structure.
 
@@ -175,6 +263,15 @@ 
         self.ui = repo.ui
         self.vfs = repo.vfs
 
+        # is this working copy using a shared storage?
+        self.sharedfeatures = self.svfs = None
+        if repo.shared():
+            features = _readsharedfeatures(repo)
+            sharedrepo = share._getsrcrepo(repo)
+            if sharedrepo is not None and 'journal' in features:
+                self.svfs = sharedrepo.vfs
+                self.sharedfeatures = features
+
     # track the current command for recording in journal entries
     @property
     def command(self):
@@ -192,19 +289,19 @@ 
         # with a non-local repo (cloning for example).
         cls._currentcommand = fullargs
 
-    def jlock(self):
+    def jlock(self, vfs):
         """Create a lock for the journal file"""
         if self._lockref and self._lockref():
             raise error.Abort(_('journal lock does not support nesting'))
-        desc = _('journal of %s') % self.vfs.base
+        desc = _('journal of %s') % vfs.base
         try:
-            l = lock.lock(self.vfs, 'journal.lock', 0, desc=desc)
+            l = lock.lock(vfs, 'journal.lock', 0, desc=desc)
         except error.LockHeld as inst:
             self.ui.warn(
                 _("waiting for lock on %s held by %r\n") % (desc, inst.locker))
             # default to 600 seconds timeout
             l = lock.lock(
-                self.vfs, 'journal.lock',
+                vfs, 'journal.lock',
                 int(self.ui.config("ui", "timeout", "600")), desc=desc)
             self.ui.warn(_("got lock after %s seconds\n") % l.delay)
         self._lockref = weakref.ref(l)
@@ -231,10 +328,20 @@ 
             util.makedate(), self.user, self.command, namespace, name,
             oldhashes, newhashes)
 
-        with self.jlock():
+        vfs = self.vfs
+        if self.svfs is not None:
+            # write to the shared repository if this feature is being
+            # shared between working copies.
+            if sharednamespaces.get(namespace) in self.sharedfeatures:
+                vfs = self.svfs
+
+        self._write(vfs, entry)
+
+    def _write(self, vfs, entry):
+        with self.jlock(vfs):
             version = None
             # open file in amend mode to ensure it is created if missing
-            with self.vfs('journal', mode='a+b', atomictemp=True) as f:
+            with vfs('journal', mode='a+b', atomictemp=True) as f:
                 f.seek(0, os.SEEK_SET)
                 # Read just enough bytes to get a version number (up to 2
                 # digits plus separator)
@@ -273,10 +380,23 @@ 
         Yields journalentry instances for each contained journal record.
 
         """
-        if not self.vfs.exists('journal'):
+        local = self._open(self.vfs)
+
+        if self.svfs is None:
+            return local
+
+        # iterate over both local and shared entries, but only those
+        # shared entries that are among the currently shared features
+        shared = (
+            e for e in self._open(self.svfs)
+            if sharednamespaces.get(e.namespace) in self.sharedfeatures)
+        return _mergeentriesiter(local, shared)
+
+    def _open(self, vfs, filename='journal', _newestfirst=True):
+        if not vfs.exists(filename):
             return
 
-        with self.vfs('journal') as f:
+        with vfs(filename) as f:
             raw = f.read()
 
         lines = raw.split('\0')
@@ -285,8 +405,12 @@ 
             version = version or _('not available')
             raise error.Abort(_("unknown journal file version '%s'") % version)
 
-        # Skip the first line, it's a version number. Reverse the rest.
-        lines = reversed(lines[1:])
+        # Skip the first line, it's a version number. Normally we iterate over
+        # these in reverse order to list newest first; only when copying across
+        # a shared storage do we forgo reversing.
+        lines = lines[1:]
+        if _newestfirst:
+            lines = reversed(lines)
         for line in lines:
             if not line:
                 continue
diff --git a/tests/test-journal-share.t b/tests/test-journal-share.t
new file mode 100644
--- /dev/null
+++ b/tests/test-journal-share.t
@@ -0,0 +1,153 @@ 
+Journal extension test: tests the share extension support
+
+  $ cat >> testmocks.py << EOF
+  > # mock out util.getuser() and util.makedate() to supply testable values
+  > import os
+  > from mercurial import util
+  > def mockgetuser():
+  >     return 'foobar'
+  > 
+  > def mockmakedate():
+  >     filename = os.path.join(os.environ['TESTTMP'], 'testtime')
+  >     try:
+  >         with open(filename, 'rb') as timef:
+  >             time = float(timef.read()) + 1
+  >     except IOError:
+  >         time = 0.0
+  >     with open(filename, 'wb') as timef:
+  >         timef.write(str(time))
+  >     return (time, 0)
+  > 
+  > util.getuser = mockgetuser
+  > util.makedate = mockmakedate
+  > EOF
+
+  $ cat >> $HGRCPATH << EOF
+  > [extensions]
+  > journal=
+  > share=
+  > testmocks=`pwd`/testmocks.py
+  > [remotenames]
+  > rename.default=remote
+  > EOF
+
+  $ hg init repo
+  $ cd repo
+  $ hg bookmark bm
+  $ touch file0
+  $ hg commit -Am 'file0 added'
+  adding file0
+  $ hg journal --all
+  previous locations of the working copy and bookmarks:
+  5640b525682e  .         commit -Am 'file0 added'
+  5640b525682e  bm        commit -Am 'file0 added'
+
+A shared working copy initially receives the same bookmarks and working copy
+
+  $ cd ..
+  $ hg share repo shared1
+  updating working directory
+  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
+  $ cd shared1
+  $ hg journal --all
+  previous locations of the working copy and bookmarks:
+  5640b525682e  .         share repo shared1
+
+unless you explicitly share bookmarks
+
+  $ cd ..
+  $ hg share --bookmarks repo shared2
+  updating working directory
+  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
+  $ cd shared2
+  $ hg journal --all
+  previous locations of the working copy and bookmarks:
+  5640b525682e  .         share --bookmarks repo shared2
+  5640b525682e  bm        commit -Am 'file0 added'
+
+Moving the bookmark in the original repository is only shown in the repository
+that shares bookmarks
+
+  $ cd ../repo
+  $ touch file1
+  $ hg commit -Am "file1 added"
+  adding file1
+  $ cd ../shared1
+  $ hg journal --all
+  previous locations of the working copy and bookmarks:
+  5640b525682e  .         share repo shared1
+  $ cd ../shared2
+  $ hg journal --all
+  previous locations of the working copy and bookmarks:
+  6432d239ac5d  bm        commit -Am 'file1 added'
+  5640b525682e  .         share --bookmarks repo shared2
+  5640b525682e  bm        commit -Am 'file0 added'
+
+But working copy changes are always 'local'
+
+  $ cd ../repo
+  $ hg up 0
+  0 files updated, 0 files merged, 1 files removed, 0 files unresolved
+  (leaving bookmark bm)
+  $ hg journal --all
+  previous locations of the working copy and bookmarks:
+  5640b525682e  .         up 0
+  6432d239ac5d  .         commit -Am 'file1 added'
+  6432d239ac5d  bm        commit -Am 'file1 added'
+  5640b525682e  .         commit -Am 'file0 added'
+  5640b525682e  bm        commit -Am 'file0 added'
+  $ cd ../shared2
+  $ hg journal --all
+  previous locations of the working copy and bookmarks:
+  6432d239ac5d  bm        commit -Am 'file1 added'
+  5640b525682e  .         share --bookmarks repo shared2
+  5640b525682e  bm        commit -Am 'file0 added'
+  $ hg up tip
+  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
+  $ hg up 0
+  0 files updated, 0 files merged, 1 files removed, 0 files unresolved
+  $ hg journal
+  previous locations of '.':
+  5640b525682e  up 0
+  6432d239ac5d  up tip
+  5640b525682e  share --bookmarks repo shared2
+
+Unsharing works as expected; the journal remains consistent
+
+  $ cd ../shared1
+  $ hg unshare
+  $ hg journal --all
+  previous locations of the working copy and bookmarks:
+  5640b525682e  .         share repo shared1
+  $ cd ../shared2
+  $ hg unshare
+  $ hg journal --all
+  previous locations of the working copy and bookmarks:
+  5640b525682e  .         up 0
+  6432d239ac5d  .         up tip
+  6432d239ac5d  bm        commit -Am 'file1 added'
+  5640b525682e  .         share --bookmarks repo shared2
+  5640b525682e  bm        commit -Am 'file0 added'
+
+New journal entries in the source repo no longer show up in the other working copies
+
+  $ cd ../repo
+  $ hg bookmark newbm -r tip
+  $ hg journal newbm
+  previous locations of 'newbm':
+  6432d239ac5d  bookmark newbm -r tip
+  $ cd ../shared2
+  $ hg journal newbm
+  previous locations of 'newbm':
+  no recorded locations
+
+This applies for both directions
+
+  $ hg bookmark shared2bm -r tip
+  $ hg journal shared2bm
+  previous locations of 'shared2bm':
+  6432d239ac5d  bookmark shared2bm -r tip
+  $ cd ../repo
+  $ hg journal shared2bm
+  previous locations of 'shared2bm':
+  no recorded locations