Patchwork [1,of,5] transaction: change the on disk format for backupentries

login
register
mail settings
Submitter Pierre-Yves David
Date Nov. 13, 2014, 11:39 p.m.
Message ID <dba745ba04b9bc614d42.1415921957@marginatus.alto.octopoid.net>
Download mbox | patch
Permalink /patch/6722/
State Accepted
Commit 7eb520f5efe4d07ba21b1a82b61a4d542722d3d4
Headers show

Comments

Pierre-Yves David - Nov. 13, 2014, 11:39 p.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@fb.com>
# Date 1415152366 0
#      Wed Nov 05 01:52:46 2014 +0000
# Node ID dba745ba04b9bc614d422e2e88b362a3bbda6948
# Parent  e44399c494ab23cefb8c99087529f9b4b06d398e
transaction: change the on disk format for backupentries

We need to store new data to improve the current transaction logic:

- location: We want to generate and backup file outside of the 'store' (eg:
  bookmarks, or various cache files). This requires knowing and preserving where
  each file is located. The value of this new field is a string. It will be used
  as a key for a vfs mapping.

- cache: We would like to handle cache file in the transaction code. This
  Will help to have cache consistent with the repository state and avoid
  performance issue on big repository like Mozilla. However, failure to handle
  cache file should not result in a transaction failure. We add a new field that
  carry this information. The value is boolean, A True value mean any error
  while handling this file can be ignored.

Those two mechanisms are not implemented yet, but they are now persisted in the
on disk file. Support for new mechanisms is coming in later changeset.

We update the file format now and will introduce the new features in later
changeset. The format version is set to 0 until we actually support the new feature.
This will prevent misunderstanding between incomplete and final client.

Support for reading both version 1 and (future) version 2 could be achieved
(using default value when reading version 1) but has not been seen as necessary
for now.
Gregory Szorc - Nov. 14, 2014, 6:08 a.m.
On 11/13/14 3:39 PM, Pierre-Yves David wrote:
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@fb.com>
> # Date 1415152366 0
> #      Wed Nov 05 01:52:46 2014 +0000
> # Node ID dba745ba04b9bc614d422e2e88b362a3bbda6948
> # Parent  e44399c494ab23cefb8c99087529f9b4b06d398e
> transaction: change the on disk format for backupentries

So I guess we aren't going with the solution that was brainstormed at 
the Munich sprint? (See "Transactions" in 
https://titanpad.com/mercurial32-sprint)

Or are you going with an easy solution to transactions before the 
complicated one (that will support reader locks)?
Pierre-Yves David - Nov. 14, 2014, 8:30 a.m.
On 11/14/2014 06:08 AM, Gregory Szorc wrote:
> On 11/13/14 3:39 PM, Pierre-Yves David wrote:
>> # HG changeset patch
>> # User Pierre-Yves David <pierre-yves.david@fb.com>
>> # Date 1415152366 0
>> #      Wed Nov 05 01:52:46 2014 +0000
>> # Node ID dba745ba04b9bc614d422e2e88b362a3bbda6948
>> # Parent  e44399c494ab23cefb8c99087529f9b4b06d398e
>> transaction: change the on disk format for backupentries
>
> So I guess we aren't going with the solution that was brainstormed at
> the Munich sprint? (See "Transactions" in
> https://titanpad.com/mercurial32-sprint)
>
> Or are you going with an easy solution to transactions before the
> complicated one (that will support reader locks)?

I'm starting for a simple solution first. The complexe solution 
discussed at the munich sprint is "useless" if nobody is playing the 
transaction game. So I first need to consolidate our transaction logic 
to ensure basic property:

- All contents written during the transaction is only made visible at 
the end of the transaction
- All content written during the transaction is made available to hooks 
and hooks only right before we commit.
- Any temporary file are properly cleaned up
- Any file involved in properly backed up in case of error.

Once I've this true, I can move to the next step: Change the format to 
ensure an atomic "write" of the data from a reader perspective. But I 
needs some to write atomically in the first place ;-)


(something that might interest you: I've a transaction cache generation 
working in my repo)
Pierre-Yves David - Nov. 14, 2014, 12:14 p.m.
On 11/14/2014 08:30 AM, Pierre-Yves David wrote:
> Once I've this true, I can move to the next step: Change the format to
> ensure an atomic "write" of the data from a reader perspective. But I
> needs some to write atomically in the first place ;-)

Also: the thing discussed during the sprint requires a repository format 
change. I would be happy if old client//repo can get some of the candy 
(with still some race condition possible, but far less).
Gregory Szorc - Nov. 14, 2014, 7:01 p.m.
On 11/14/14 12:30 AM, Pierre-Yves David wrote:
>
>
> On 11/14/2014 06:08 AM, Gregory Szorc wrote:
>> On 11/13/14 3:39 PM, Pierre-Yves David wrote:
>>> # HG changeset patch
>>> # User Pierre-Yves David <pierre-yves.david@fb.com>
>>> # Date 1415152366 0
>>> #      Wed Nov 05 01:52:46 2014 +0000
>>> # Node ID dba745ba04b9bc614d422e2e88b362a3bbda6948
>>> # Parent  e44399c494ab23cefb8c99087529f9b4b06d398e
>>> transaction: change the on disk format for backupentries
>>
>> So I guess we aren't going with the solution that was brainstormed at
>> the Munich sprint? (See "Transactions" in
>> https://titanpad.com/mercurial32-sprint)
>>
>> Or are you going with an easy solution to transactions before the
>> complicated one (that will support reader locks)?
>
> I'm starting for a simple solution first. The complexe solution
> discussed at the munich sprint is "useless" if nobody is playing the
> transaction game. So I first need to consolidate our transaction logic
> to ensure basic property:
>
> - All contents written during the transaction is only made visible at
> the end of the transaction
> - All content written during the transaction is made available to hooks
> and hooks only right before we commit.
> - Any temporary file are properly cleaned up
> - Any file involved in properly backed up in case of error.
>
> Once I've this true, I can move to the next step: Change the format to
> ensure an atomic "write" of the data from a reader perspective. But I
> needs some to write atomically in the first place ;-)

Makes sense to me.

I am very excited about seeing these patches. As you may know, cache 
invalidation causing an expensive rebuild is a big problem for large 
repositories. We are seeing these expensive rebuilds occur after a 
pretxnchangegroup hook fails a push, for example. I can't wait to deploy 
3.3 at Mozilla!

Of course, we still see cache invalidation after strip operations. We'll 
need memctx everywhere, hidden changesets enabled by default, and people 
to stop using strip-based workflows (mq) before this is fully resolved, 
I fear. But this transaction work is refreshing nonetheless.
Pierre-Yves David - Nov. 14, 2014, 7:09 p.m.
On 11/14/2014 07:01 PM, Gregory Szorc wrote:
> On 11/14/14 12:30 AM, Pierre-Yves David wrote:
>>
>>
>> On 11/14/2014 06:08 AM, Gregory Szorc wrote:
>>> On 11/13/14 3:39 PM, Pierre-Yves David wrote:
>>>> # HG changeset patch
>>>> # User Pierre-Yves David <pierre-yves.david@fb.com>
>>>> # Date 1415152366 0
>>>> #      Wed Nov 05 01:52:46 2014 +0000
>>>> # Node ID dba745ba04b9bc614d422e2e88b362a3bbda6948
>>>> # Parent  e44399c494ab23cefb8c99087529f9b4b06d398e
>>>> transaction: change the on disk format for backupentries
>>>
>>> So I guess we aren't going with the solution that was brainstormed at
>>> the Munich sprint? (See "Transactions" in
>>> https://titanpad.com/mercurial32-sprint)
>>>
>>> Or are you going with an easy solution to transactions before the
>>> complicated one (that will support reader locks)?
>>
>> I'm starting for a simple solution first. The complexe solution
>> discussed at the munich sprint is "useless" if nobody is playing the
>> transaction game. So I first need to consolidate our transaction logic
>> to ensure basic property:
>>
>> - All contents written during the transaction is only made visible at
>> the end of the transaction
>> - All content written during the transaction is made available to hooks
>> and hooks only right before we commit.
>> - Any temporary file are properly cleaned up
>> - Any file involved in properly backed up in case of error.
>>
>> Once I've this true, I can move to the next step: Change the format to
>> ensure an atomic "write" of the data from a reader perspective. But I
>> needs some to write atomically in the first place ;-)
>
> Makes sense to me.
>
> I am very excited about seeing these patches. As you may know, cache
> invalidation causing an expensive rebuild is a big problem for large
> repositories. We are seeing these expensive rebuilds occur after a
> pretxnchangegroup hook fails a push, for example. I can't wait to deploy
> 3.3 at Mozilla!
>
> Of course, we still see cache invalidation after strip operations. We'll
> need memctx everywhere, hidden changesets enabled by default, and people
> to stop using strip-based workflows (mq) before this is fully resolved,
> I fear. But this transaction work is refreshing nonetheless.

Note: my current plan it to put the infrastructure in place (using the 
simplest cache we have to test it: hidden cache). I then expect an army 
of mozillian leprechaun to pop and do the work to move bigger cache in 
the transaction.

Patch

diff --git a/mercurial/transaction.py b/mercurial/transaction.py
--- a/mercurial/transaction.py
+++ b/mercurial/transaction.py
@@ -13,11 +13,11 @@ 
 
 from i18n import _
 import errno
 import error, util
 
-version = 1
+version = 0
 
 def active(func):
     def _active(self, *args, **kwds):
         if self.count == 0:
             raise error.Abort(_(
@@ -41,11 +41,11 @@  def _playback(journal, report, opener, e
             except (IOError, OSError), inst:
                 if inst.errno != errno.ENOENT:
                     raise
 
     backupfiles = []
-    for f, b in backupentries:
+    for l, f, b, c in backupentries:
         if f and b:
             filepath = opener.join(f)
             backuppath = opener.join(b)
             try:
                 util.copyfile(backuppath, filepath)
@@ -97,13 +97,14 @@  class transaction(object):
         self._queue = []
         # a dict of arguments to be passed to hooks
         self.hookargs = {}
         self.file = opener.open(self.journal, "w")
 
-        # a list of ('path', 'backuppath') entries.
+        # a list of ('location', 'path', 'backuppath', cache) entries.
         # if 'backuppath' is empty, no file existed at backup time
         # if 'path' is empty, this is a temporary transaction file
+        # (location, and cache are current unused)
         self._backupentries = []
         self._backupmap = {}
         self._backupjournal = "%s.backupfiles" % journal
         self._backupsfile = opener.open(self._backupjournal, 'w')
         self._backupsfile.write('%d\n' % version)
@@ -191,27 +192,27 @@  class transaction(object):
             backuppath = self.opener.join(backupfile)
             util.copyfiles(filepath, backuppath, hardlink=hardlink)
         else:
             backupfile = ''
 
-        self._addbackupentry((file, backupfile))
+        self._addbackupentry(('', file, backupfile, False))
 
     def _addbackupentry(self, entry):
         """register a new backup entry and write it to disk"""
         self._backupentries.append(entry)
         self._backupmap[file] = len(self._backupentries) - 1
-        self._backupsfile.write("%s\0%s\n" % entry)
+        self._backupsfile.write("%s\0%s\0%s\0%d\n" % entry)
         self._backupsfile.flush()
 
     @active
     def registertmp(self, tmpfile):
         """register a temporary transaction file
 
         Such file will be delete when the transaction exit (on both failure and
         success).
         """
-        self._addbackupentry(('', tmpfile))
+        self._addbackupentry(('', '', tmpfile, False))
 
     @active
     def addfilegenerator(self, genid, filenames, genfunc, order=0, vfs=None):
         """add a function to generates some files at transaction commit
 
@@ -353,21 +354,21 @@  class transaction(object):
         if self.count != 0:
             return
         self.file.close()
         self._backupsfile.close()
         # cleanup temporary files
-        for f, b in self._backupentries:
+        for _l, f, b, _c in self._backupentries:
             if not f and b and self.opener.exists(b):
                 self.opener.unlink(b)
         self.entries = []
         if self.after:
             self.after()
         if self.opener.isfile(self.journal):
             self.opener.unlink(self.journal)
         if self.opener.isfile(self._backupjournal):
             self.opener.unlink(self._backupjournal)
-            for _f, b in self._backupentries:
+            for _l, _f, b, _c in self._backupentries:
                 if b and self.opener.exists(b):
                     self.opener.unlink(b)
         self._backupentries = []
         self.journal = None
         # run post close action
@@ -445,12 +446,12 @@  def rollback(opener, file, report):
             if ver == str(version):
                 for line in lines[1:]:
                     if line:
                         # Shave off the trailing newline
                         line = line[:-1]
-                        f, b = line.split('\0')
-                        backupentries.append((f, b))
+                        l, f, b, c = line.split('\0')
+                        backupentries.append((l, f, b, bool(c)))
             else:
-                report(_("journal was created by a newer version of "
+                report(_("journal was created by a different version of "
                          "Mercurial"))
 
     _playback(file, report, opener, entries, backupentries)