Submitter | Pierre-Yves David |
---|---|
Date | Nov. 13, 2014, 11:39 p.m. |
Message ID | <dba745ba04b9bc614d42.1415921957@marginatus.alto.octopoid.net> |
Download | mbox | patch |
Permalink | /patch/6722/ |
State | Accepted |
Commit | 7eb520f5efe4d07ba21b1a82b61a4d542722d3d4 |
Headers | show |
Comments
On 11/13/14 3:39 PM, Pierre-Yves David wrote: > # HG changeset patch > # User Pierre-Yves David <pierre-yves.david@fb.com> > # Date 1415152366 0 > # Wed Nov 05 01:52:46 2014 +0000 > # Node ID dba745ba04b9bc614d422e2e88b362a3bbda6948 > # Parent e44399c494ab23cefb8c99087529f9b4b06d398e > transaction: change the on disk format for backupentries So I guess we aren't going with the solution that was brainstormed at the Munich sprint? (See "Transactions" in https://titanpad.com/mercurial32-sprint) Or are you going with an easy solution to transactions before the complicated one (that will support reader locks)?
On 11/14/2014 06:08 AM, Gregory Szorc wrote: > On 11/13/14 3:39 PM, Pierre-Yves David wrote: >> # HG changeset patch >> # User Pierre-Yves David <pierre-yves.david@fb.com> >> # Date 1415152366 0 >> # Wed Nov 05 01:52:46 2014 +0000 >> # Node ID dba745ba04b9bc614d422e2e88b362a3bbda6948 >> # Parent e44399c494ab23cefb8c99087529f9b4b06d398e >> transaction: change the on disk format for backupentries > > So I guess we aren't going with the solution that was brainstormed at > the Munich sprint? (See "Transactions" in > https://titanpad.com/mercurial32-sprint) > > Or are you going with an easy solution to transactions before the > complicated one (that will support reader locks)? I'm starting for a simple solution first. The complexe solution discussed at the munich sprint is "useless" if nobody is playing the transaction game. So I first need to consolidate our transaction logic to ensure basic property: - All contents written during the transaction is only made visible at the end of the transaction - All content written during the transaction is made available to hooks and hooks only right before we commit. - Any temporary file are properly cleaned up - Any file involved in properly backed up in case of error. Once I've this true, I can move to the next step: Change the format to ensure an atomic "write" of the data from a reader perspective. But I needs some to write atomically in the first place ;-) (something that might interest you: I've a transaction cache generation working in my repo)
On 11/14/2014 08:30 AM, Pierre-Yves David wrote: > Once I've this true, I can move to the next step: Change the format to > ensure an atomic "write" of the data from a reader perspective. But I > needs some to write atomically in the first place ;-) Also: the thing discussed during the sprint requires a repository format change. I would be happy if old client//repo can get some of the candy (with still some race condition possible, but far less).
On 11/14/14 12:30 AM, Pierre-Yves David wrote: > > > On 11/14/2014 06:08 AM, Gregory Szorc wrote: >> On 11/13/14 3:39 PM, Pierre-Yves David wrote: >>> # HG changeset patch >>> # User Pierre-Yves David <pierre-yves.david@fb.com> >>> # Date 1415152366 0 >>> # Wed Nov 05 01:52:46 2014 +0000 >>> # Node ID dba745ba04b9bc614d422e2e88b362a3bbda6948 >>> # Parent e44399c494ab23cefb8c99087529f9b4b06d398e >>> transaction: change the on disk format for backupentries >> >> So I guess we aren't going with the solution that was brainstormed at >> the Munich sprint? (See "Transactions" in >> https://titanpad.com/mercurial32-sprint) >> >> Or are you going with an easy solution to transactions before the >> complicated one (that will support reader locks)? > > I'm starting for a simple solution first. The complexe solution > discussed at the munich sprint is "useless" if nobody is playing the > transaction game. So I first need to consolidate our transaction logic > to ensure basic property: > > - All contents written during the transaction is only made visible at > the end of the transaction > - All content written during the transaction is made available to hooks > and hooks only right before we commit. > - Any temporary file are properly cleaned up > - Any file involved in properly backed up in case of error. > > Once I've this true, I can move to the next step: Change the format to > ensure an atomic "write" of the data from a reader perspective. But I > needs some to write atomically in the first place ;-) Makes sense to me. I am very excited about seeing these patches. As you may know, cache invalidation causing an expensive rebuild is a big problem for large repositories. We are seeing these expensive rebuilds occur after a pretxnchangegroup hook fails a push, for example. I can't wait to deploy 3.3 at Mozilla! Of course, we still see cache invalidation after strip operations. We'll need memctx everywhere, hidden changesets enabled by default, and people to stop using strip-based workflows (mq) before this is fully resolved, I fear. But this transaction work is refreshing nonetheless.
On 11/14/2014 07:01 PM, Gregory Szorc wrote: > On 11/14/14 12:30 AM, Pierre-Yves David wrote: >> >> >> On 11/14/2014 06:08 AM, Gregory Szorc wrote: >>> On 11/13/14 3:39 PM, Pierre-Yves David wrote: >>>> # HG changeset patch >>>> # User Pierre-Yves David <pierre-yves.david@fb.com> >>>> # Date 1415152366 0 >>>> # Wed Nov 05 01:52:46 2014 +0000 >>>> # Node ID dba745ba04b9bc614d422e2e88b362a3bbda6948 >>>> # Parent e44399c494ab23cefb8c99087529f9b4b06d398e >>>> transaction: change the on disk format for backupentries >>> >>> So I guess we aren't going with the solution that was brainstormed at >>> the Munich sprint? (See "Transactions" in >>> https://titanpad.com/mercurial32-sprint) >>> >>> Or are you going with an easy solution to transactions before the >>> complicated one (that will support reader locks)? >> >> I'm starting for a simple solution first. The complexe solution >> discussed at the munich sprint is "useless" if nobody is playing the >> transaction game. So I first need to consolidate our transaction logic >> to ensure basic property: >> >> - All contents written during the transaction is only made visible at >> the end of the transaction >> - All content written during the transaction is made available to hooks >> and hooks only right before we commit. >> - Any temporary file are properly cleaned up >> - Any file involved in properly backed up in case of error. >> >> Once I've this true, I can move to the next step: Change the format to >> ensure an atomic "write" of the data from a reader perspective. But I >> needs some to write atomically in the first place ;-) > > Makes sense to me. > > I am very excited about seeing these patches. As you may know, cache > invalidation causing an expensive rebuild is a big problem for large > repositories. We are seeing these expensive rebuilds occur after a > pretxnchangegroup hook fails a push, for example. I can't wait to deploy > 3.3 at Mozilla! > > Of course, we still see cache invalidation after strip operations. We'll > need memctx everywhere, hidden changesets enabled by default, and people > to stop using strip-based workflows (mq) before this is fully resolved, > I fear. But this transaction work is refreshing nonetheless. Note: my current plan it to put the infrastructure in place (using the simplest cache we have to test it: hidden cache). I then expect an army of mozillian leprechaun to pop and do the work to move bigger cache in the transaction.
Patch
diff --git a/mercurial/transaction.py b/mercurial/transaction.py --- a/mercurial/transaction.py +++ b/mercurial/transaction.py @@ -13,11 +13,11 @@ from i18n import _ import errno import error, util -version = 1 +version = 0 def active(func): def _active(self, *args, **kwds): if self.count == 0: raise error.Abort(_( @@ -41,11 +41,11 @@ def _playback(journal, report, opener, e except (IOError, OSError), inst: if inst.errno != errno.ENOENT: raise backupfiles = [] - for f, b in backupentries: + for l, f, b, c in backupentries: if f and b: filepath = opener.join(f) backuppath = opener.join(b) try: util.copyfile(backuppath, filepath) @@ -97,13 +97,14 @@ class transaction(object): self._queue = [] # a dict of arguments to be passed to hooks self.hookargs = {} self.file = opener.open(self.journal, "w") - # a list of ('path', 'backuppath') entries. + # a list of ('location', 'path', 'backuppath', cache) entries. # if 'backuppath' is empty, no file existed at backup time # if 'path' is empty, this is a temporary transaction file + # (location, and cache are current unused) self._backupentries = [] self._backupmap = {} self._backupjournal = "%s.backupfiles" % journal self._backupsfile = opener.open(self._backupjournal, 'w') self._backupsfile.write('%d\n' % version) @@ -191,27 +192,27 @@ class transaction(object): backuppath = self.opener.join(backupfile) util.copyfiles(filepath, backuppath, hardlink=hardlink) else: backupfile = '' - self._addbackupentry((file, backupfile)) + self._addbackupentry(('', file, backupfile, False)) def _addbackupentry(self, entry): """register a new backup entry and write it to disk""" self._backupentries.append(entry) self._backupmap[file] = len(self._backupentries) - 1 - self._backupsfile.write("%s\0%s\n" % entry) + self._backupsfile.write("%s\0%s\0%s\0%d\n" % entry) self._backupsfile.flush() @active def registertmp(self, tmpfile): """register a temporary transaction file Such file will be delete when the transaction exit (on both failure and success). """ - self._addbackupentry(('', tmpfile)) + self._addbackupentry(('', '', tmpfile, False)) @active def addfilegenerator(self, genid, filenames, genfunc, order=0, vfs=None): """add a function to generates some files at transaction commit @@ -353,21 +354,21 @@ class transaction(object): if self.count != 0: return self.file.close() self._backupsfile.close() # cleanup temporary files - for f, b in self._backupentries: + for _l, f, b, _c in self._backupentries: if not f and b and self.opener.exists(b): self.opener.unlink(b) self.entries = [] if self.after: self.after() if self.opener.isfile(self.journal): self.opener.unlink(self.journal) if self.opener.isfile(self._backupjournal): self.opener.unlink(self._backupjournal) - for _f, b in self._backupentries: + for _l, _f, b, _c in self._backupentries: if b and self.opener.exists(b): self.opener.unlink(b) self._backupentries = [] self.journal = None # run post close action @@ -445,12 +446,12 @@ def rollback(opener, file, report): if ver == str(version): for line in lines[1:]: if line: # Shave off the trailing newline line = line[:-1] - f, b = line.split('\0') - backupentries.append((f, b)) + l, f, b, c = line.split('\0') + backupentries.append((l, f, b, bool(c))) else: - report(_("journal was created by a newer version of " + report(_("journal was created by a different version of " "Mercurial")) _playback(file, report, opener, entries, backupentries)