Submitter | Gregory Szorc |
---|---|
Date | Jan. 7, 2015, 6 a.m. |
Message ID | <5affbb31eea2a1034bc7.1420610418@3.1.168.192.in-addr.arpa> |
Download | mbox | patch |
Permalink | /patch/7346/ |
State | Accepted |
Commit | d486e52352e80426db6f3d738e35bb675b3c3fa2 |
Headers | show |
Comments
On Tue, 2015-01-06 at 22:00 -0800, Gregory Szorc wrote: > # HG changeset patch > # User Gregory Szorc <gregory.szorc@gmail.com> > # Date 1420610193 28800 > # Tue Jan 06 21:56:33 2015 -0800 > # Node ID 5affbb31eea2a1034bc73ee20182880adcc9af36 > # Parent b9d06fa10ef29c012d48bd4f3c93fd7bf1347d40 > transaction: support for callbacks during abort This is queued for default, thanks.
On 01/08/2015 10:21 AM, Matt Mackall wrote: > On Tue, 2015-01-06 at 22:00 -0800, Gregory Szorc wrote: >> # HG changeset patch >> # User Gregory Szorc <gregory.szorc@gmail.com> >> # Date 1420610193 28800 >> # Tue Jan 06 21:56:33 2015 -0800 >> # Node ID 5affbb31eea2a1034bc73ee20182880adcc9af36 >> # Parent b9d06fa10ef29c012d48bd4f3c93fd7bf1347d40 >> transaction: support for callbacks during abort > > This is queued for default, thanks. The patch is good, but I recommend setting some was to have to run somewhere in the test suite if you want it to stay unbroken.
At Tue, 06 Jan 2015 22:00:18 -0800, Gregory Szorc wrote: > > # HG changeset patch > # User Gregory Szorc <gregory.szorc@gmail.com> > # Date 1420610193 28800 > # Tue Jan 06 21:56:33 2015 -0800 > # Node ID 5affbb31eea2a1034bc73ee20182880adcc9af36 > # Parent b9d06fa10ef29c012d48bd4f3c93fd7bf1347d40 > transaction: support for callbacks during abort > > Previous transaction work added callbacks to be called during regular > transaction commit/close. As part of refactoring Mozilla's pushlog > extension (an extension that opens a SQLite database and tries to tie > its transaction semantics to Mercurial's transaction), I discovered that > the new transaction APIs were insufficient to avoid monkeypatching > transaction instance internals. Adding a callback that is called during > transaction abort removes the necessity for monkeypatching and completes > the API. > > diff --git a/mercurial/transaction.py b/mercurial/transaction.py > --- a/mercurial/transaction.py > +++ b/mercurial/transaction.py (snip) > @@ -442,8 +455,10 @@ class transaction(object): > > self.report(_("transaction abort!\n")) > > try: > + for cat in sorted(self._abortcallback): > + self._abortcallback[cat](self) > _playback(self.journal, self.report, self.opener, self._vfsmap, > self.entries, self._backupentries, False) > self.report(_("rollback completed\n")) > except Exception: (also CC-ed to Pierre-Yves as a implementer of 'txnabort' hook) Now, I'm working for dirstate consistency around inside and border of transaction, and planning to use this abort callback mechanism to know the end of transaction at failure. But current implementation doesn't invoke abort callbacks, if there is no change on repository data at aborting transaction: e.g. "hg import" fails to import the first patch. So, there is no easy and certain way to know the end of transaction at failure. BTW, this also causes that 'txnabort' hook isn't invoked at aborting transaction in such case, because 'txnabort' hook invocation is achieved by abort callback mechanism. This may need some cleanup for orphan 'pretxnopen' hook assuming that one of 'txnclose' or 'txnabort' will be invoked subsequently. Then, please let me confirm some points below before changing around 'transaction._abort()' for my purpose: - are there any usecases to expect actual changes to be rollbacked ? In other words, does it cause problem to invoke abort callbacks regardless of actual changes ? - are there any usecases to refer changes to be rollbacked in abort callbacks ? In other words, does it cause problem to invoke abort callbacks after rollbacking ? BTW, 'txnabort' hooks are invoked without 'pending', and it means that such changes should be invisible to 'txnabort' hooks, doesn't it ? > Pierre-Yves Then, IMHO, such changes should be invisible to abort callbacks, too. Maybe, should I introduce another callback to know the end of transaction (regardless of the result of it, or only at failure) ? ---------------------------------------------------------------------- [FUJIWARA Katsunori] foozy@lares.dti.ne.jp
On Sat, Jul 11, 2015 at 3:31 AM, FUJIWARA Katsunori <foozy@lares.dti.ne.jp> wrote: > At Tue, 06 Jan 2015 22:00:18 -0800, > Gregory Szorc wrote: > > > > # HG changeset patch > > # User Gregory Szorc <gregory.szorc@gmail.com> > > # Date 1420610193 28800 > > # Tue Jan 06 21:56:33 2015 -0800 > > # Node ID 5affbb31eea2a1034bc73ee20182880adcc9af36 > > # Parent b9d06fa10ef29c012d48bd4f3c93fd7bf1347d40 > > transaction: support for callbacks during abort > > > > Previous transaction work added callbacks to be called during regular > > transaction commit/close. As part of refactoring Mozilla's pushlog > > extension (an extension that opens a SQLite database and tries to tie > > its transaction semantics to Mercurial's transaction), I discovered that > > the new transaction APIs were insufficient to avoid monkeypatching > > transaction instance internals. Adding a callback that is called during > > transaction abort removes the necessity for monkeypatching and completes > > the API. > > > > diff --git a/mercurial/transaction.py b/mercurial/transaction.py > > --- a/mercurial/transaction.py > > +++ b/mercurial/transaction.py > > (snip) > > > @@ -442,8 +455,10 @@ class transaction(object): > > > > self.report(_("transaction abort!\n")) > > > > try: > > + for cat in sorted(self._abortcallback): > > + self._abortcallback[cat](self) > > _playback(self.journal, self.report, self.opener, > self._vfsmap, > > self.entries, self._backupentries, False) > > self.report(_("rollback completed\n")) > > except Exception: > > (also CC-ed to Pierre-Yves as a implementer of 'txnabort' hook) > > Now, I'm working for dirstate consistency around inside and border of > transaction, and planning to use this abort callback mechanism to know > the end of transaction at failure. > > But current implementation doesn't invoke abort callbacks, if there is > no change on repository data at aborting transaction: e.g. "hg import" > fails to import the first patch. > > So, there is no easy and certain way to know the end of transaction at > failure. > > BTW, this also causes that 'txnabort' hook isn't invoked at aborting > transaction in such case, because 'txnabort' hook invocation is > achieved by abort callback mechanism. This may need some cleanup for > orphan 'pretxnopen' hook assuming that one of 'txnclose' or 'txnabort' > will be invoked subsequently. > > Then, please let me confirm some points below before changing around > 'transaction._abort()' for my purpose: > > - are there any usecases to expect actual changes to be rollbacked ? > > In other words, does it cause problem to invoke abort callbacks > regardless of actual changes ? > Mozilla has a use case in our pushlog extension ( https://hg.mozilla.org/hgcustom/version-control-tools/file/45f5a094da62/hgext/pushlog/__init__.py#l221). Not sure if there is anything in core. Possibly not. > > - are there any usecases to refer changes to be rollbacked in abort > callbacks ? > > In other words, does it cause problem to invoke abort callbacks > after rollbacking ? > I don't think it matters if registered rollbacks are called before or after core rollbacks. But, I think it is important that the "rollback completed" message is printed after all rollbacks are performed. Otherwise, you could see weird output. e.g. adding changesets adding manifests adding file changes added 1 changesets with 1 changes to 1 files transaction abort! rollback completed rolling back message from extension > BTW, 'txnabort' hooks are invoked without 'pending', and it means > that such changes should be invisible to 'txnabort' hooks, doesn't > it ? > Pierre-Yves > > Then, IMHO, such changes should be invisible to abort callbacks, > too. > > Maybe, should I introduce another callback to know the end of > transaction (regardless of the result of it, or only at failure) ? > I think that would be a nice addition to the API and would round out the callback support nicely. This could be used for better forensic logging, for example.
On 07/11/2015 11:31 AM, FUJIWARA Katsunori wrote: > At Tue, 06 Jan 2015 22:00:18 -0800, > Gregory Szorc wrote: >> >> # HG changeset patch >> # User Gregory Szorc <gregory.szorc@gmail.com> >> # Date 1420610193 28800 >> # Tue Jan 06 21:56:33 2015 -0800 >> # Node ID 5affbb31eea2a1034bc73ee20182880adcc9af36 >> # Parent b9d06fa10ef29c012d48bd4f3c93fd7bf1347d40 >> transaction: support for callbacks during abort >> >> Previous transaction work added callbacks to be called during regular >> transaction commit/close. As part of refactoring Mozilla's pushlog >> extension (an extension that opens a SQLite database and tries to tie >> its transaction semantics to Mercurial's transaction), I discovered that >> the new transaction APIs were insufficient to avoid monkeypatching >> transaction instance internals. Adding a callback that is called during >> transaction abort removes the necessity for monkeypatching and completes >> the API. >> >> diff --git a/mercurial/transaction.py b/mercurial/transaction.py >> --- a/mercurial/transaction.py >> +++ b/mercurial/transaction.py > > (snip) > >> @@ -442,8 +455,10 @@ class transaction(object): >> >> self.report(_("transaction abort!\n")) >> >> try: >> + for cat in sorted(self._abortcallback): >> + self._abortcallback[cat](self) >> _playback(self.journal, self.report, self.opener, self._vfsmap, >> self.entries, self._backupentries, False) >> self.report(_("rollback completed\n")) >> except Exception: > > (also CC-ed to Pierre-Yves as a implementer of 'txnabort' hook) > > Now, I'm working for dirstate consistency around inside and border of > transaction, and planning to use this abort callback mechanism to know > the end of transaction at failure. I'm a bit curious about what code you need to run in such case. If you use the file generator mechanism you do not need to worries about restoring backup. However, I'm not sure how rolling back the file would atomatically invalide the in memory object content (probably not). Is this why you need a callback? could we (would it make sense to) integrate that as part of one of our current cache invalidation mechanism? > But current implementation doesn't invoke abort callbacks, if there is > no change on repository data at aborting transaction: e.g. "hg import" > fails to import the first patch. The change to dirstate should most probably register himself as change to the transaction. That would trigger it. > So, there is no easy and certain way to know the end of transaction at > failure. > > BTW, this also causes that 'txnabort' hook isn't invoked at aborting > transaction in such case, because 'txnabort' hook invocation is > achieved by abort callback mechanism. This may need some cleanup for > orphan 'pretxnopen' hook assuming that one of 'txnclose' or 'txnabort' > will be invoked subsequently. > > Then, please let me confirm some points below before changing around > 'transaction._abort()' for my purpose: > > - are there any usecases to expect actual changes to be rollbacked ? > > In other words, does it cause problem to invoke abort callbacks > regardless of actual changes ? If I remember correctly, it make "transaction abort" "abort complete message to happears in a lot of new place. Basically any place where we open a transaction but eventually do nothing. We can probably not change it > - are there any usecases to refer changes to be rollbacked in abort > callbacks ? > > In other words, does it cause problem to invoke abort callbacks > after rollbacking ? I would says yes, but detail could be complicated (to be continued) > > BTW, 'txnabort' hooks are invoked without 'pending', and it means > that such changes should be invisible to 'txnabort' hooks, doesn't > it ? > Pierre-Yves And here we have proof of (1) people do not rely of it too much (2) definition of the state of the transaciton during abort is far from trivial. We should probably keep it with rolled back content/ > > Then, IMHO, such changes should be invisible to abort callbacks, > too. > > Maybe, should I introduce another callback to know the end of > transaction (regardless of the result of it, or only at failure) ? Sounds like an awesome idea. Python have: try: .... # equivalent to transaction open except: .... # equivalent to transaction abort else: .... # equivalent to transaction success finally: .... # equivalent of your proposal Finally is definitly very useful, so go for it.
At Sat, 11 Jul 2015 20:47:00 +0100, Pierre-Yves David wrote: > On 07/11/2015 11:31 AM, FUJIWARA Katsunori wrote: > > At Tue, 06 Jan 2015 22:00:18 -0800, > > Gregory Szorc wrote: > >> > >> # HG changeset patch > >> # User Gregory Szorc <gregory.szorc@gmail.com> > >> # Date 1420610193 28800 > >> # Tue Jan 06 21:56:33 2015 -0800 > >> # Node ID 5affbb31eea2a1034bc73ee20182880adcc9af36 > >> # Parent b9d06fa10ef29c012d48bd4f3c93fd7bf1347d40 > >> transaction: support for callbacks during abort > >> > >> Previous transaction work added callbacks to be called during regular > >> transaction commit/close. As part of refactoring Mozilla's pushlog > >> extension (an extension that opens a SQLite database and tries to tie > >> its transaction semantics to Mercurial's transaction), I discovered that > >> the new transaction APIs were insufficient to avoid monkeypatching > >> transaction instance internals. Adding a callback that is called during > >> transaction abort removes the necessity for monkeypatching and completes > >> the API. > >> > >> diff --git a/mercurial/transaction.py b/mercurial/transaction.py > >> --- a/mercurial/transaction.py > >> +++ b/mercurial/transaction.py > > > > (snip) > > > >> @@ -442,8 +455,10 @@ class transaction(object): > >> > >> self.report(_("transaction abort!\n")) > >> > >> try: > >> + for cat in sorted(self._abortcallback): > >> + self._abortcallback[cat](self) > >> _playback(self.journal, self.report, self.opener, self._vfsmap, > >> self.entries, self._backupentries, False) > >> self.report(_("rollback completed\n")) > >> except Exception: > > > > (also CC-ed to Pierre-Yves as a implementer of 'txnabort' hook) Thank you for your comments, Gregory and Pierre-Yves. I'll work to add another callback to know the end of transaction, as described later. So, some of comments below are just for your information. > > Now, I'm working for dirstate consistency around inside and border of > > transaction, and planning to use this abort callback mechanism to know > > the end of transaction at failure. > > I'm a bit curious about what code you need to run in such case. If you > use the file generator mechanism you do not need to worries about > restoring backup. > > However, I'm not sure how rolling back the file would atomatically > invalide the in memory object content (probably not). Is this why you > need a callback? could we (would it make sense to) integrate that as > part of one of our current cache invalidation mechanism? I'm planning to use callback mechanism for purposes below: 1. to make existing 'dirstate.write()' invocations write in-memory changes into appropriate file ('.hg/dirstate' or '.hg/dirstate.pending'), 'dirstate.write()' should know whether transaction is running or not I'm planning changes below for this purpose: - invoke (newly added) 'dirstate.begintransaction()' at the beginning of transaction in 'localrepo.transaction()' - invoke (newly added) 'dirstate.endtransaction()' at the end of transaction via "the end of transaction" callback (instead of finalize/abort callbacks) There are some other ways: 1-1. instantiate dirstate with 'repo' (or weak ref of repo to avoid circular reference ?) 1-2. make 'dirstate.write()' require 'localrepo' argument This seems too large impact (especially for 3rd party extensions, even though Mercurial doesn't ensure compatibility of internal API at all :-)) In both cases, 'dirstate.write()' can determine appropriate output file by examination of 'repo.currenttransaction()'. 2. in-memory dirstate changes should be discarded at failure of transaction, to prevent them from being written out at (outer) 'wlock.release' or so I'm planning to do this via "the end of transaction" callback. Using 'dirstate.{begin|end}parentchange()' mechanism as below also works similarly. - invoke beginparentchange at starting transaction - invoke endparentchange at (successfully) closing transaction - wlock.release at failure invokes 'dirstate.invalidate()' But it seems too optimistic for outside transaction :-), and it can't ensure consistency just at the end of transaction. > > But current implementation doesn't invoke abort callbacks, if there is > > no change on repository data at aborting transaction: e.g. "hg import" > > fails to import the first patch. > > The change to dirstate should most probably register himself as change > to the transaction. That would trigger it. I found two problems of 'addfilegenerator()' below. IMHO, the former is a major problem, and the latter is minor one (but you should dislike it as later :-)). And, both of my fixing plans cause omitting invocation of abort callbacks, because they makes 'transaction._backupentries' empty :-< - current 'addfilegenerator()' causes forcibly restoring '.hg/dirstate' at rollback-ing, even if rollback-ing doesn't change parents of the working directory This causes unexpected "hg status" output at rollback-ing in such situation (aka non-"parentgone" case). I'm planning to introduce 'backup' optional argument, which controls 'addbackup()' invocation in '_generatefiles()' to avoid issue above, to 'addfilegenerator()'. https://selenic.com/repo/hg/file/tip/mercurial/transaction.py#l267 According to discussion before, dirstate writes changes into (a) '.hg/dirstate' at the beginning of transaction and (b) '.hg/dirstate.pending' while transaction running. Then, discarding the latter can work as a kind of restoring. So, omitting 'addbackup()' invocation doesn't cause problem at least for dirstate. Or is another trick like "add to _backupentries, but omit backup itself" better ? - current 'addfilegenerator()' causes forcibly writing data out at 'writepending()', even if data hasn't been changed since last writing out When external hooks are invoked while transaction, this causes additional "transaction abort!" and "rollback completed" output, even though dirstate hasn't been changed since the beginning of transaction. For example, pushing changes/bookmarks causes writing into 'dirstate.pending' at REMOTE side, even though it never uses and changes dirstate. I'm planning to introduce 'checkdirty' optional argument, which is used to examine whether data has been changed since last writing out (= 'genfunc' should be invoked or not), to avoid redundant writing data out. > > So, there is no easy and certain way to know the end of transaction at > > failure. > > > > BTW, this also causes that 'txnabort' hook isn't invoked at aborting > > transaction in such case, because 'txnabort' hook invocation is > > achieved by abort callback mechanism. This may need some cleanup for > > orphan 'pretxnopen' hook assuming that one of 'txnclose' or 'txnabort' > > will be invoked subsequently. > > > > Then, please let me confirm some points below before changing around > > 'transaction._abort()' for my purpose: > > > > - are there any usecases to expect actual changes to be rollbacked ? > > > > In other words, does it cause problem to invoke abort callbacks > > regardless of actual changes ? > > If I remember correctly, it make "transaction abort" "abort complete > message to happears in a lot of new place. Basically any place where we > open a transaction but eventually do nothing. We can probably not change it Adding invocation of abort callbacks just before cleaning journal files up can avoid such additional messages, even though duplicated code paths are redundant :-) https://selenic.com/repo/hg/file/tip/mercurial/transaction.py#l477 diff --git a/mercurial/transaction.py b/mercurial/transaction.py --- a/mercurial/transaction.py +++ b/mercurial/transaction.py @@ -482,6 +482,8 @@ class transaction(object): try: if not self.entries and not self._backupentries: + for cat in sorted(self._abortcallback): + self._abortcallback[cat](self) if self.journal: self.opener.unlink(self.journal) if self._backupjournal: > > - are there any usecases to refer changes to be rollbacked in abort > > callbacks ? > > > > In other words, does it cause problem to invoke abort callbacks > > after rollbacking ? > > I would says yes, but detail could be complicated (to be continued) > > > > > BTW, 'txnabort' hooks are invoked without 'pending', and it means > > that such changes should be invisible to 'txnabort' hooks, doesn't > > it ? > Pierre-Yves > > And here we have proof of (1) people do not rely of it too much (2) > definition of the state of the transaciton during abort is far from trivial. > We should probably keep it with rolled back content/ It means that 'txnabort' hooks should be invoked with 'pending' (= rolled back content should be visible to them), doen't it ? (sorry for my poor English, if I misunderstood :-<) > > > > Then, IMHO, such changes should be invisible to abort callbacks, > > too. > > > > Maybe, should I introduce another callback to know the end of > > transaction (regardless of the result of it, or only at failure) ? > > Sounds like an awesome idea. Python have: > > try: > .... # equivalent to transaction open > except: > .... # equivalent to transaction abort > else: > .... # equivalent to transaction success > finally: > .... # equivalent of your proposal > > Finally is definitly very useful, so go for it. I'll do so. > -- > Pierre-Yves David > > > > > > > > > > > > ---------------------------------------------------------------------- > > [FUJIWARA Katsunori] foozy@lares.dti.ne.jp > > > > -- > Pierre-Yves David > ---------------------------------------------------------------------- [FUJIWARA Katsunori] foozy@lares.dti.ne.jp
On 07/13/2015 07:44 AM, FUJIWARA Katsunori wrote: > > > At Sat, 11 Jul 2015 20:47:00 +0100, > Pierre-Yves David wrote: > >> On 07/11/2015 11:31 AM, FUJIWARA Katsunori wrote: >>> Now, I'm working for dirstate consistency around inside and border of >>> transaction, and planning to use this abort callback mechanism to know >>> the end of transaction at failure. >> >> I'm a bit curious about what code you need to run in such case. If you >> use the file generator mechanism you do not need to worries about >> restoring backup. >> >> However, I'm not sure how rolling back the file would atomatically >> invalide the in memory object content (probably not). Is this why you >> need a callback? could we (would it make sense to) integrate that as >> part of one of our current cache invalidation mechanism? > > > I'm planning to use callback mechanism for purposes below: > > 1. to make existing 'dirstate.write()' invocations write in-memory > changes into appropriate file ('.hg/dirstate' or '.hg/dirstate.pending'), > 'dirstate.write()' should know whether transaction is running or not > > I'm planning changes below for this purpose: > > - invoke (newly added) 'dirstate.begintransaction()' at the > beginning of transaction in 'localrepo.transaction()' > > - invoke (newly added) 'dirstate.endtransaction()' at the end of > transaction via "the end of transaction" callback (instead of > finalize/abort callbacks) > > > There are some other ways: > > 1-1. instantiate dirstate with 'repo' > (or weak ref of repo to avoid circular reference ?) > > 1-2. make 'dirstate.write()' require 'localrepo' argument > > This seems too large impact (especially for 3rd party > extensions, even though Mercurial doesn't ensure compatibility > of internal API at all :-)) > > In both cases, 'dirstate.write()' can determine appropriate output > file by examination of 'repo.currenttransaction()'. I would prefer one of this options. Using hooks for such core code looks a bit hacky and fragile. I would be happy with wither 1-1 (weak-ref please) or 1-2. With a preference for 1-2. You can make the repo argument optional, with a devel warning if it is missing for example. To avoid too much breakage for a couple of version. > 2. in-memory dirstate changes should be discarded at failure of > transaction, to prevent them from being written out at (outer) > 'wlock.release' or so > > I'm planning to do this via "the end of transaction" callback. > > > Using 'dirstate.{begin|end}parentchange()' mechanism as below also > works similarly. > > - invoke beginparentchange at starting transaction > - invoke endparentchange at (successfully) closing transaction > - wlock.release at failure invokes 'dirstate.invalidate()' > > But it seems too optimistic for outside transaction :-), and it > can't ensure consistency just at the end of transaction. I'm a bit confused about why you would invoe dirstate.invalidate() at lock release. Can't we have something invalidating the dirstate content when transaction if rolledback (so that dirstate is read from disk? Do we have something forcing dirstate flushing at the transaction openning? (so that the backup we restore is okay?) >>> But current implementation doesn't invoke abort callbacks, if there is >>> no change on repository data at aborting transaction: e.g. "hg import" >>> fails to import the first patch. >> >> The change to dirstate should most probably register himself as change >> to the transaction. That would trigger it. > > I found two problems of 'addfilegenerator()' below. IMHO, the former > is a major problem, and the latter is minor one (but you should > dislike it as later :-)). If addfilegenerator fails to fit our needs here, we should probably hammer it so it does. > And, both of my fixing plans cause omitting invocation of abort > callbacks, because they makes 'transaction._backupentries' empty :-< > > - current 'addfilegenerator()' causes forcibly restoring > '.hg/dirstate' at rollback-ing, even if rollback-ing doesn't change > parents of the working directory > > This causes unexpected "hg status" output at rollback-ing in such > situation (aka non-"parentgone" case). Do you mean there is category of change to dirstate that must not be rolledback if transaction fails? I do not see data about this on the wikipage (but I could be blind) Can you elaborate? I'm not sure to understand the issue here. > I'm planning to introduce 'backup' optional argument, which controls > 'addbackup()' invocation in '_generatefiles()' to avoid issue above, > to 'addfilegenerator()'. > > https://selenic.com/repo/hg/file/tip/mercurial/transaction.py#l267 That sound wrong to me. We either have change to be apply on transaction success and rolledback on transaction failure. But I can't see case were backup will be wrong. > According to discussion before, dirstate writes changes into (a) > '.hg/dirstate' at the beginning of transaction and (b) > '.hg/dirstate.pending' while transaction running. Then, discarding > the latter can work as a kind of restoring. So, omitting > 'addbackup()' invocation doesn't cause problem at least for dirstate. > > Or is another trick like "add to _backupentries, but omit backup > itself" better ? The backup are also used for rollback (and hg recover). We must keep it. > > - current 'addfilegenerator()' causes forcibly writing data out at > 'writepending()', even if data hasn't been changed since last > writing out > > When external hooks are invoked while transaction, this causes > additional "transaction abort!" and "rollback completed" output, > even though dirstate hasn't been changed since the beginning of > transaction. > > For example, pushing changes/bookmarks causes writing into > 'dirstate.pending' at REMOTE side, even though it never uses and > changes dirstate. If there is no change to the dirstate. Why are we recording any change into the transaction at all? > I'm planning to introduce 'checkdirty' optional argument, which is > used to examine whether data has been changed since last writing > out (= 'genfunc' should be invoked or not), to avoid redundant > writing data out. Yes, this way of detecting dirtyness have been bothering me in the past. I've a small series to clean this us, let me undust it and patchbomb it. >>> So, there is no easy and certain way to know the end of transaction at >>> failure. >>> >>> BTW, this also causes that 'txnabort' hook isn't invoked at aborting >>> transaction in such case, because 'txnabort' hook invocation is >>> achieved by abort callback mechanism. This may need some cleanup for >>> orphan 'pretxnopen' hook assuming that one of 'txnclose' or 'txnabort' >>> will be invoked subsequently. >>> >>> Then, please let me confirm some points below before changing around >>> 'transaction._abort()' for my purpose: >>> >>> - are there any usecases to expect actual changes to be rollbacked ? >>> >>> In other words, does it cause problem to invoke abort callbacks >>> regardless of actual changes ? >> >> If I remember correctly, it make "transaction abort" "abort complete >> message to happears in a lot of new place. Basically any place where we >> open a transaction but eventually do nothing. We can probably not change it > > Adding invocation of abort callbacks just before cleaning journal > files up can avoid such additional messages, even though duplicated > code paths are redundant :-) > > https://selenic.com/repo/hg/file/tip/mercurial/transaction.py#l477 > > diff --git a/mercurial/transaction.py b/mercurial/transaction.py > --- a/mercurial/transaction.py > +++ b/mercurial/transaction.py > @@ -482,6 +482,8 @@ class transaction(object): > > try: > if not self.entries and not self._backupentries: > + for cat in sorted(self._abortcallback): > + self._abortcallback[cat](self) > if self.journal: > self.opener.unlink(self.journal) > if self._backupjournal: I do not understand what we are trying to achieve here. Having a wider definition of abort (abort apply even if transaction is empty) seems a dubious semantic and will have huge impact in the UI. > > >>> - are there any usecases to refer changes to be rollbacked in abort >>> callbacks ? >>> >>> In other words, does it cause problem to invoke abort callbacks >>> after rollbacking ? >> >> I would says yes, but detail could be complicated (to be continued) >> >>> >>> BTW, 'txnabort' hooks are invoked without 'pending', and it means >>> that such changes should be invisible to 'txnabort' hooks, doesn't >>> it ? > Pierre-Yves >> >> And here we have proof of (1) people do not rely of it too much (2) >> definition of the state of the transaction during abort is far from trivial. >> We should probably keep it with rolled back content/ > > It means that 'txnabort' hooks should be invoked with 'pending' (= > rolled back content should be visible to them), doen't it ? (sorry for > my poor English, if I misunderstood :-<) No. It means that: 1) Nobody complains about the lack of pending so far (since we do not have it). 2) I'm not sure we can have anything sensible done here (reading pending with a transaction half applied, will be bad) Conclusion: we should keep not having pending there. >>> Then, IMHO, such changes should be invisible to abort callbacks, >>> too. >>> >>> Maybe, should I introduce another callback to know the end of >>> transaction (regardless of the result of it, or only at failure) ? >> >> Sounds like an awesome idea. Python have: >> >> try: >> .... # equivalent to transaction open >> except: >> .... # equivalent to transaction abort >> else: >> .... # equivalent to transaction success >> finally: >> .... # equivalent of your proposal >> >> Finally is definitely very useful, so go for it. > > I'll do so. Cool, (but It is not clear it is on the critical path, so no rush).
At Tue, 14 Jul 2015 13:26:14 +0200, Pierre-Yves David wrote: > > On 07/13/2015 07:44 AM, FUJIWARA Katsunori wrote: > > > > > > At Sat, 11 Jul 2015 20:47:00 +0100, > > Pierre-Yves David wrote: > > > >> On 07/11/2015 11:31 AM, FUJIWARA Katsunori wrote: > >>> Now, I'm working for dirstate consistency around inside and border of > >>> transaction, and planning to use this abort callback mechanism to know > >>> the end of transaction at failure. > >> > >> I'm a bit curious about what code you need to run in such case. If you > >> use the file generator mechanism you do not need to worries about > >> restoring backup. > >> > >> However, I'm not sure how rolling back the file would atomatically > >> invalide the in memory object content (probably not). Is this why you > >> need a callback? could we (would it make sense to) integrate that as > >> part of one of our current cache invalidation mechanism? > > > > > > I'm planning to use callback mechanism for purposes below: > > > > 1. to make existing 'dirstate.write()' invocations write in-memory > > changes into appropriate file ('.hg/dirstate' or '.hg/dirstate.pending'), > > 'dirstate.write()' should know whether transaction is running or not > > > > I'm planning changes below for this purpose: > > > > - invoke (newly added) 'dirstate.begintransaction()' at the > > beginning of transaction in 'localrepo.transaction()' > > > > - invoke (newly added) 'dirstate.endtransaction()' at the end of > > transaction via "the end of transaction" callback (instead of > > finalize/abort callbacks) > > > > > > There are some other ways: > > > > 1-1. instantiate dirstate with 'repo' > > (or weak ref of repo to avoid circular reference ?) > > > > 1-2. make 'dirstate.write()' require 'localrepo' argument > > > > This seems too large impact (especially for 3rd party > > extensions, even though Mercurial doesn't ensure compatibility > > of internal API at all :-)) > > > > In both cases, 'dirstate.write()' can determine appropriate output > > file by examination of 'repo.currenttransaction()'. > > I would prefer one of this options. Using hooks for such core code looks > a bit hacky and fragile. How about something new like 'unlock' of 'lock' (or 'after' of transaction) ? Would you think this kind of 'call back' mechanism itself is also fragile for core code (if newly added) ? > I would be happy with wither 1-1 (weak-ref > please) or 1-2. With a preference for 1-2. > > You can make the repo argument optional, with a devel warning if it is > missing for example. To avoid too much breakage for a couple of version. BTW, what is your strong reason to choose 1-2 ? to avoid using weak ref of repo ? > > 2. in-memory dirstate changes should be discarded at failure of > > transaction, to prevent them from being written out at (outer) > > 'wlock.release' or so > > > > I'm planning to do this via "the end of transaction" callback. > > > > > > Using 'dirstate.{begin|end}parentchange()' mechanism as below also > > works similarly. > > > > - invoke beginparentchange at starting transaction > > - invoke endparentchange at (successfully) closing transaction > > - wlock.release at failure invokes 'dirstate.invalidate()' > > > > But it seems too optimistic for outside transaction :-), and it > > can't ensure consistency just at the end of transaction. > > I'm a bit confused about why you would invoe dirstate.invalidate() at > lock release. Can't we have something invalidating the dirstate content > when transaction if rolledback (so that dirstate is read from disk? (at first, 'dirstate.{begin|end}parentchange()' example above is just a example to do similar thing in another way, but not what I want to do) > Do we have something forcing dirstate flushing at the transaction > openning? (so that the backup we restore is okay?) AFAIK, with current implementation: - in-memory dirstate changes aren't explicitly written out at opening transaction In almost all cases, in-memory dirstate changes seem to be INDIRECTLY written out before transaction opening. But it isn't explicitly ensured. This causes: 1. saving incomplete '.hg/dirstate' into '.hg/journal.dirstate' at opening transaction, and 2. restoring one from (incorrect) '.hg/undo.dirstate', which is renamed from '.hg/journal.dirstate' above, at rollback (Oops, I have overlooked that journal file is created not from dirstate instance but from '.hg/dirstate' file at opening transaction :-< we have to fix this at first) For example, at "hg backout --commit", dirstate changes via reverting aren't written out until 'ctx.markcommitted()' after closing the transaction. And, "hg status" after "hg rollback" shows unexpected status for added/removed files. BTW, after my previous change fe03f522dda9, this issue can be reproduced, only when 'repo.status()' doesn't cause 'normal()'-ing on some unsure files :-) https://selenic.com/repo/hg/rev/fe03f522dda9 - in-memory dirstate changes aren't explicitly discarded at aborting transaction (for convenience, let's assume that '.hg/journal.dirstate' is correct, at this point) As you described, aborting transaction restores '.hg/dirstate' from '.hg/journal.dirstate' saved at opening transaction. But in-memory changes during transaction are still kept in 'dirstate' instance, and they may be written into already restored '.hg/dirstate' at 'wlock.release()' or so. BTW, commands almost fully enclosed by transaction treat "in-memory changes during aborted transaction" in ways below: - in-memory changes are discarded by 'dirstateguard' OUTSIDE transaction scope ('hg qpush', 'hg commit --amend', and 'hg import') "discarding in-memory dirstate changes at aborting transaction" will make dirstateguard in these cases useless. - in-memory changes should be discarded, but not ('hg transplant') Aborting for other than conflict may cause unexpected dirstate (e.g. referring rollbacked revision as parent). This should be fixed. - discarding in-memory changes causes problem ('hg shelve' and 'hg unshelve') Current shelve/unshelve implementation expects aborting transaction NOT to discard in-memory dirstate changes intentionally or unintentionally. 1. start transaction 2. do shelve/unshelve, and this makes temporary revisions 3. update working directory to the parent revision at (1) (+ some additional actions) - dirstate is different from one at (1), because some changes are shelved/unshelved 4. abort transaction intentionally - rollback revisions created at (2) - restore '.hg/dirstate' from one at (1) 5. write dirstate at 'wlock.release' - here, '.hg/dirstate' is equal to one at (3) IMHO, "strip by aborting current transaction" at (4) above should be replaced by "strip by rollbacking the transaction after once closing it". The latter doesn't strip the parent of the working directory at stripping because of updating at (3): this is non-"parentgone" case. Then, dirstate is kept as same as one at (3) above. Let me confirm how dirstate should be treated at the border of transaction scope. - at opening transaction: - write in-memory dirstate changes into '.hg/dirstate' (not yet) - at successfully closing transaction: - write changes during transaction into '.hg/dirstate' (not yet) - rename from '.hg/journal.dirstate' to '.hg/undo.dirstate' (OK) - at aborting transaction: - restore '.hg/dirstate' from '.hg/journal.dirstat' (OK) - discard in-memory dirstate changes (not yet) > >>> But current implementation doesn't invoke abort callbacks, if there is > >>> no change on repository data at aborting transaction: e.g. "hg import" > >>> fails to import the first patch. > >> > >> The change to dirstate should most probably register himself as change > >> to the transaction. That would trigger it. > > > > I found two problems of 'addfilegenerator()' below. IMHO, the former > > is a major problem, and the latter is minor one (but you should > > dislike it as later :-)). > > If addfilegenerator fails to fit our needs here, we should probably > hammer it so it does. > > > And, both of my fixing plans cause omitting invocation of abort > > callbacks, because they makes 'transaction._backupentries' empty :-< > > > > - current 'addfilegenerator()' causes forcibly restoring > > '.hg/dirstate' at rollback-ing, even if rollback-ing doesn't change > > parents of the working directory > > > > This causes unexpected "hg status" output at rollback-ing in such > > situation (aka non-"parentgone" case). > > Do you mean there is category of change to dirstate that must not be > rolledback if transaction fails? I do not see data about this on the > wikipage (but I could be blind) Can you elaborate? > > I'm not sure to understand the issue here. "rollback" above is not one by aborting while transaction running, but one by "hg rollback"/"repo.rollback()" after once closing transaction. In the later case, parents of the working directory may differ from one at opening transaction, and forcible restoring dirstate causes unexpected result of subsequent "hg status". Current 'addfilegenerator()' mechanism records '.hg/dirstate' as to be restored at rollback, and it causes forcible restoring. 'repo.rollback()' avoids this problem by restoring dirstate by itself. https://selenic.com/repo/hg/file/tip/mercurial/localrepo.py#l1110 > > I'm planning to introduce 'backup' optional argument, which controls > > 'addbackup()' invocation in '_generatefiles()' to avoid issue above, > > to 'addfilegenerator()'. > > > > https://selenic.com/repo/hg/file/tip/mercurial/transaction.py#l267 > > That sound wrong to me. We either have change to be apply on transaction > success and rolledback on transaction failure. But I can't see case were > backup will be wrong. > > > > According to discussion before, dirstate writes changes into (a) > > '.hg/dirstate' at the beginning of transaction and (b) > > '.hg/dirstate.pending' while transaction running. Then, discarding > > the latter can work as a kind of restoring. So, omitting > > 'addbackup()' invocation doesn't cause problem at least for dirstate. > > > > Or is another trick like "add to _backupentries, but omit backup > > itself" better ? > > The backup are also used for rollback (and hg recover). We must keep it. > > > > > - current 'addfilegenerator()' causes forcibly writing data out at > > 'writepending()', even if data hasn't been changed since last > > writing out > > > > When external hooks are invoked while transaction, this causes > > additional "transaction abort!" and "rollback completed" output, > > even though dirstate hasn't been changed since the beginning of > > transaction. > > > > For example, pushing changes/bookmarks causes writing into > > 'dirstate.pending' at REMOTE side, even though it never uses and > > changes dirstate. > > If there is no change to the dirstate. Why are we recording any change > into the transaction at all? I understood that 'addfilegenerator()' is used for files which may be changed during transaction like "journal", and invoked it for dirstate at opening transaction. Should I use it after examination whether dirstate has been changed since opening transaction ? > > I'm planning to introduce 'checkdirty' optional argument, which is > > used to examine whether data has been changed since last writing > > out (= 'genfunc' should be invoked or not), to avoid redundant > > writing data out. > > Yes, this way of detecting dirtyness have been bothering me in the past. > I've a small series to clean this us, let me undust it and patchbomb it. > > >>> So, there is no easy and certain way to know the end of transaction at > >>> failure. > >>> > >>> BTW, this also causes that 'txnabort' hook isn't invoked at aborting > >>> transaction in such case, because 'txnabort' hook invocation is > >>> achieved by abort callback mechanism. This may need some cleanup for > >>> orphan 'pretxnopen' hook assuming that one of 'txnclose' or 'txnabort' > >>> will be invoked subsequently. > >>> > >>> Then, please let me confirm some points below before changing around > >>> 'transaction._abort()' for my purpose: > >>> > >>> - are there any usecases to expect actual changes to be rollbacked ? > >>> > >>> In other words, does it cause problem to invoke abort callbacks > >>> regardless of actual changes ? > >> > >> If I remember correctly, it make "transaction abort" "abort complete > >> message to happears in a lot of new place. Basically any place where we > >> open a transaction but eventually do nothing. We can probably not change it > > > > Adding invocation of abort callbacks just before cleaning journal > > files up can avoid such additional messages, even though duplicated > > code paths are redundant :-) > > > > https://selenic.com/repo/hg/file/tip/mercurial/transaction.py#l477 > > > > diff --git a/mercurial/transaction.py b/mercurial/transaction.py > > --- a/mercurial/transaction.py > > +++ b/mercurial/transaction.py > > @@ -482,6 +482,8 @@ class transaction(object): > > > > try: > > if not self.entries and not self._backupentries: > > + for cat in sorted(self._abortcallback): > > + self._abortcallback[cat](self) > > if self.journal: > > self.opener.unlink(self.journal) > > if self._backupjournal: > > I do not understand what we are trying to achieve here. Having a wider > definition of abort (abort apply even if transaction is empty) seems a > dubious semantic and will have huge impact in the UI. > > > > > > > >>> - are there any usecases to refer changes to be rollbacked in abort > >>> callbacks ? > >>> > >>> In other words, does it cause problem to invoke abort callbacks > >>> after rollbacking ? > >> > >> I would says yes, but detail could be complicated (to be continued) > >> > >>> > >>> BTW, 'txnabort' hooks are invoked without 'pending', and it means > >>> that such changes should be invisible to 'txnabort' hooks, doesn't > >>> it ? > Pierre-Yves > >> > >> And here we have proof of (1) people do not rely of it too much (2) > >> definition of the state of the transaction during abort is far from trivial. > >> We should probably keep it with rolled back content/ > > > > It means that 'txnabort' hooks should be invoked with 'pending' (= > > rolled back content should be visible to them), doen't it ? (sorry for > > my poor English, if I misunderstood :-<) > > No. It means that: > > 1) Nobody complains about the lack of pending so far (since we do not > have it). > > 2) I'm not sure we can have anything sensible done here (reading pending > with a transaction half applied, will be bad) > > Conclusion: we should keep not having pending there. OK, I see. > >>> Then, IMHO, such changes should be invisible to abort callbacks, > >>> too. > >>> > >>> Maybe, should I introduce another callback to know the end of > >>> transaction (regardless of the result of it, or only at failure) ? > >> > >> Sounds like an awesome idea. Python have: > >> > >> try: > >> .... # equivalent to transaction open > >> except: > >> .... # equivalent to transaction abort > >> else: > >> .... # equivalent to transaction success > >> finally: > >> .... # equivalent of your proposal > >> > >> Finally is definitely very useful, so go for it. > > > > I'll do so. > > Cool, (but It is not clear it is on the critical path, so no rush). > > -- > Pierre-Yves David > ---------------------------------------------------------------------- [FUJIWARA Katsunori] foozy@lares.dti.ne.jp
On 07/16/2015 04:43 PM, FUJIWARA Katsunori wrote: > > At Tue, 14 Jul 2015 13:26:14 +0200, > Pierre-Yves David wrote: >> >> On 07/13/2015 07:44 AM, FUJIWARA Katsunori wrote: >>> >>> >>> At Sat, 11 Jul 2015 20:47:00 +0100, >>> Pierre-Yves David wrote: >>> >>>> On 07/11/2015 11:31 AM, FUJIWARA Katsunori wrote: >>>>> Now, I'm working for dirstate consistency around inside and border of >>>>> transaction, and planning to use this abort callback mechanism to know >>>>> the end of transaction at failure. >>>> >>>> I'm a bit curious about what code you need to run in such case. If you >>>> use the file generator mechanism you do not need to worries about >>>> restoring backup. >>>> >>>> However, I'm not sure how rolling back the file would atomatically >>>> invalide the in memory object content (probably not). Is this why you >>>> need a callback? could we (would it make sense to) integrate that as >>>> part of one of our current cache invalidation mechanism? >>> >>> >>> I'm planning to use callback mechanism for purposes below: >>> >>> 1. to make existing 'dirstate.write()' invocations write in-memory >>> changes into appropriate file ('.hg/dirstate' or '.hg/dirstate.pending'), >>> 'dirstate.write()' should know whether transaction is running or not >>> >>> I'm planning changes below for this purpose: >>> >>> - invoke (newly added) 'dirstate.begintransaction()' at the >>> beginning of transaction in 'localrepo.transaction()' >>> >>> - invoke (newly added) 'dirstate.endtransaction()' at the end of >>> transaction via "the end of transaction" callback (instead of >>> finalize/abort callbacks) >>> >>> >>> There are some other ways: >>> >>> 1-1. instantiate dirstate with 'repo' >>> (or weak ref of repo to avoid circular reference ?) >>> >>> 1-2. make 'dirstate.write()' require 'localrepo' argument >>> >>> This seems too large impact (especially for 3rd party >>> extensions, even though Mercurial doesn't ensure compatibility >>> of internal API at all :-)) >>> >>> In both cases, 'dirstate.write()' can determine appropriate output >>> file by examination of 'repo.currenttransaction()'. >> >> I would prefer one of this options. Using hooks for such core code looks >> a bit hacky and fragile. > > How about something new like 'unlock' of 'lock' (or 'after' of > transaction) ? Would you think this kind of 'call back' mechanism > itself is also fragile for core code (if newly added) ? I do not understand what you mean. I'm worried about hook approach being "fragile" because it dirstate defines very important data for the understanding of the repo. Having it a the same place as other extensions and config hook will probably lead to some ordering hell. >> I would be happy with wither 1-1 (weak-ref >> please) or 1-2. With a preference for 1-2. >> >> You can make the repo argument optional, with a devel warning if it is >> missing for example. To avoid too much breakage for a couple of version. > > BTW, what is your strong reason to choose 1-2 ? to avoid using > weak ref of repo ? I do not have a strong opinion about 1-1 vs 1-2. 1-2 avoid keeping the cyclic reference in the object design. We use the pattern of "providing a repo" when needed in multiple place. It also help refactoring code to have the argument explicitly passed. because it ensure function calling write have access to all necessary argument (here repo). I think Matt have a strong opinion agains weakref. >>> 2. in-memory dirstate changes should be discarded at failure of >>> transaction, to prevent them from being written out at (outer) >>> 'wlock.release' or so >>> >>> I'm planning to do this via "the end of transaction" callback. >>> >>> >>> Using 'dirstate.{begin|end}parentchange()' mechanism as below also >>> works similarly. >>> >>> - invoke beginparentchange at starting transaction >>> - invoke endparentchange at (successfully) closing transaction >>> - wlock.release at failure invokes 'dirstate.invalidate()' >>> >>> But it seems too optimistic for outside transaction :-), and it >>> can't ensure consistency just at the end of transaction. >> >> I'm a bit confused about why you would invoe dirstate.invalidate() at >> lock release. Can't we have something invalidating the dirstate content >> when transaction if rolledback (so that dirstate is read from disk? > > (at first, 'dirstate.{begin|end}parentchange()' example above is > just a example to do similar thing in another way, but not what I > want to do) > >> Do we have something forcing dirstate flushing at the transaction >> openning? (so that the backup we restore is okay?) > > AFAIK, with current implementation: > > - in-memory dirstate changes aren't explicitly written out at opening > transaction > > In almost all cases, in-memory dirstate changes seem to be > INDIRECTLY written out before transaction opening. But it isn't > explicitly ensured. > > This causes: > > 1. saving incomplete '.hg/dirstate' into '.hg/journal.dirstate' at > opening transaction, and > > 2. restoring one from (incorrect) '.hg/undo.dirstate', which is > renamed from '.hg/journal.dirstate' above, at rollback > > (Oops, I have overlooked that journal file is created not from > dirstate instance but from '.hg/dirstate' file at opening > transaction :-< we have to fix this at first) > > For example, at "hg backout --commit", dirstate changes via > reverting aren't written out until 'ctx.markcommitted()' > after closing the transaction. And, "hg status" after "hg > rollback" shows unexpected status for added/removed files. > > BTW, after my previous change fe03f522dda9, this issue can be > reproduced, only when 'repo.status()' doesn't cause 'normal()'-ing > on some unsure files :-) > > https://selenic.com/repo/hg/rev/fe03f522dda9 It seems like we should never start a transaction with a dirty dirstate. Is there any theoritical reason for not doing it? > > - in-memory dirstate changes aren't explicitly discarded at > aborting transaction > > (for convenience, let's assume that '.hg/journal.dirstate' is > correct, at this point) > > As you described, aborting transaction restores '.hg/dirstate' from > '.hg/journal.dirstate' saved at opening transaction. > > But in-memory changes during transaction are still kept in > 'dirstate' instance, and they may be written into already restored > '.hg/dirstate' at 'wlock.release()' or so. This sounds wrong (but less keep moving) > > BTW, commands almost fully enclosed by transaction treat "in-memory > changes during aborted transaction" in ways below: > > - in-memory changes are discarded by 'dirstateguard' OUTSIDE > transaction scope > ('hg qpush', 'hg commit --amend', and 'hg import') > > "discarding in-memory dirstate changes at aborting > transaction" will make dirstateguard in these cases > useless. > > - in-memory changes should be discarded, but not > ('hg transplant') I do not understand this sentence. > Aborting for other than conflict may cause unexpected dirstate > (e.g. referring rollbacked revision as parent). This should be > fixed. > > - discarding in-memory changes causes problem > ('hg shelve' and 'hg unshelve') > > Current shelve/unshelve implementation expects aborting > transaction NOT to discard in-memory dirstate changes intentionally > or unintentionally. > > 1. start transaction > 2. do shelve/unshelve, and this makes temporary revisions > 3. update working directory to the parent revision at (1) > (+ some additional actions) > - dirstate is different from one at (1), because > some changes are shelved/unshelved > 4. abort transaction intentionally > - rollback revisions created at (2) > - restore '.hg/dirstate' from one at (1) > 5. write dirstate at 'wlock.release' > - here, '.hg/dirstate' is equal to one at (3) > > IMHO, "strip by aborting current transaction" at (4) above > should be replaced by "strip by rollbacking the transaction > after once closing it". > > The latter doesn't strip the parent of the working directory at > stripping because of updating at (3): this is non-"parentgone" > case. Then, dirstate is kept as same as one at (3) above. Is there anything else than shelve relying on this. Having shelve relying on a wrong/buggy behavior from core seems debatable. It should not block us to fix core. > > Let me confirm how dirstate should be treated at the border of > transaction scope. > > - at opening transaction: > > - write in-memory dirstate changes into '.hg/dirstate' (not yet) > > - at successfully closing transaction: > > - write changes during transaction into '.hg/dirstate' (not yet) > - rename from '.hg/journal.dirstate' to '.hg/undo.dirstate' (OK) > > - at aborting transaction: > > - restore '.hg/dirstate' from '.hg/journal.dirstat' (OK) > - discard in-memory dirstate changes (not yet) This summary looks perfect. Any objection/issue with doing this (beside shelve abuse?) >>>>> But current implementation doesn't invoke abort callbacks, if there is >>>>> no change on repository data at aborting transaction: e.g. "hg import" >>>>> fails to import the first patch. >>>> >>>> The change to dirstate should most probably register himself as change >>>> to the transaction. That would trigger it. >>> >>> I found two problems of 'addfilegenerator()' below. IMHO, the former >>> is a major problem, and the latter is minor one (but you should >>> dislike it as later :-)). >> >> If addfilegenerator fails to fit our needs here, we should probably >> hammer it so it does. >> >>> And, both of my fixing plans cause omitting invocation of abort >>> callbacks, because they makes 'transaction._backupentries' empty :-< >>> >>> - current 'addfilegenerator()' causes forcibly restoring >>> '.hg/dirstate' at rollback-ing, even if rollback-ing doesn't change >>> parents of the working directory >>> >>> This causes unexpected "hg status" output at rollback-ing in such >>> situation (aka non-"parentgone" case). >> >> Do you mean there is category of change to dirstate that must not be >> rolledback if transaction fails? I do not see data about this on the >> wikipage (but I could be blind) Can you elaborate? >> >> I'm not sure to understand the issue here. > > "rollback" above is not one by aborting while transaction running, but > one by "hg rollback"/"repo.rollback()" after once closing transaction. > > In the later case, parents of the working directory may differ > from one at opening transaction, and forcible restoring dirstate > causes unexpected result of subsequent "hg status". > > Current 'addfilegenerator()' mechanism records '.hg/dirstate' as to be > restored at rollback, and it causes forcible restoring. > > 'repo.rollback()' avoids this problem by restoring dirstate by itself. > > https://selenic.com/repo/hg/file/tip/mercurial/localrepo.py#l1110 I'm not sure about what this section (of the email) is about. Look like the linked code is close to what we need to invalidate the dirstate in memory content. But its late so I'm giving up with this section. I think we made good enough progress on the other ones. >>> I'm planning to introduce 'backup' optional argument, which controls >>> 'addbackup()' invocation in '_generatefiles()' to avoid issue above, >>> to 'addfilegenerator()'. >>> >>> https://selenic.com/repo/hg/file/tip/mercurial/transaction.py#l267 >> >> That sound wrong to me. We either have change to be apply on transaction >> success and rolledback on transaction failure. But I can't see case were >> backup will be wrong. >> >> >>> According to discussion before, dirstate writes changes into (a) >>> '.hg/dirstate' at the beginning of transaction and (b) >>> '.hg/dirstate.pending' while transaction running. Then, discarding >>> the latter can work as a kind of restoring. So, omitting >>> 'addbackup()' invocation doesn't cause problem at least for dirstate. >>> >>> Or is another trick like "add to _backupentries, but omit backup >>> itself" better ? >> >> The backup are also used for rollback (and hg recover). We must keep it. >> >>> >>> - current 'addfilegenerator()' causes forcibly writing data out at >>> 'writepending()', even if data hasn't been changed since last >>> writing out >>> >>> When external hooks are invoked while transaction, this causes >>> additional "transaction abort!" and "rollback completed" output, >>> even though dirstate hasn't been changed since the beginning of >>> transaction. >>> >>> For example, pushing changes/bookmarks causes writing into >>> 'dirstate.pending' at REMOTE side, even though it never uses and >>> changes dirstate. >> >> If there is no change to the dirstate. Why are we recording any change >> into the transaction at all? > > I understood that 'addfilegenerator()' is used for files which may be > changed during transaction like "journal", and invoked it for dirstate > at opening transaction. > > Should I use it after examination whether dirstate has been changed > since opening transaction ? The current semantic is: I've new content, here is how to generated it on disk. So it should be called by the code updating dirstate to "commit" the change. It should replace (or be contained in) dirstate.write() calls. (we should probably update the docstring to make it clean)
At Sat, 18 Jul 2015 01:48:24 +0200, Pierre-Yves David wrote: > > On 07/16/2015 04:43 PM, FUJIWARA Katsunori wrote: > > > > At Tue, 14 Jul 2015 13:26:14 +0200, > > Pierre-Yves David wrote: > >> > >> On 07/13/2015 07:44 AM, FUJIWARA Katsunori wrote: > >>> > >>> > >>> At Sat, 11 Jul 2015 20:47:00 +0100, > >>> Pierre-Yves David wrote: > >>> > >>>> On 07/11/2015 11:31 AM, FUJIWARA Katsunori wrote: > >>>>> Now, I'm working for dirstate consistency around inside and border of > >>>>> transaction, and planning to use this abort callback mechanism to know > >>>>> the end of transaction at failure. > >>>> > >>>> I'm a bit curious about what code you need to run in such case. If you > >>>> use the file generator mechanism you do not need to worries about > >>>> restoring backup. > >>>> > >>>> However, I'm not sure how rolling back the file would atomatically > >>>> invalide the in memory object content (probably not). Is this why you > >>>> need a callback? could we (would it make sense to) integrate that as > >>>> part of one of our current cache invalidation mechanism? > >>> > >>> > >>> I'm planning to use callback mechanism for purposes below: > >>> > >>> 1. to make existing 'dirstate.write()' invocations write in-memory > >>> changes into appropriate file ('.hg/dirstate' or '.hg/dirstate.pending'), > >>> 'dirstate.write()' should know whether transaction is running or not > >>> > >>> I'm planning changes below for this purpose: > >>> > >>> - invoke (newly added) 'dirstate.begintransaction()' at the > >>> beginning of transaction in 'localrepo.transaction()' > >>> > >>> - invoke (newly added) 'dirstate.endtransaction()' at the end of > >>> transaction via "the end of transaction" callback (instead of > >>> finalize/abort callbacks) > >>> > >>> > >>> There are some other ways: > >>> > >>> 1-1. instantiate dirstate with 'repo' > >>> (or weak ref of repo to avoid circular reference ?) > >>> > >>> 1-2. make 'dirstate.write()' require 'localrepo' argument > >>> > >>> This seems too large impact (especially for 3rd party > >>> extensions, even though Mercurial doesn't ensure compatibility > >>> of internal API at all :-)) > >>> > >>> In both cases, 'dirstate.write()' can determine appropriate output > >>> file by examination of 'repo.currenttransaction()'. > >> > >> I would prefer one of this options. Using hooks for such core code looks > >> a bit hacky and fragile. > > > > How about something new like 'unlock' of 'lock' (or 'after' of > > transaction) ? Would you think this kind of 'call back' mechanism > > itself is also fragile for core code (if newly added) ? > > I do not understand what you mean. > > I'm worried about hook approach being "fragile" because it dirstate > defines very important data for the understanding of the repo. Having it > a the same place as other extensions and config hook will probably lead > to some ordering hell. (Oops, 'unlock' is just a name of a local variable in 'localrepo.lock()'. Correct name of constructor argument is 'releasefn') As you said, using 'hook' may lead to some ordering hell by other extensions and config hook. What we want is a kind of 'releasefn' callback for 'lock' class, which is called at the end of lock scope via invocation of own 'release()' (or via '__del__' of itself). 'transaction' constructor takes 'after' callback argument, but it is called only when transaction is successfully closing. To provide safe notification mechanism, which is isolated from hook infrastructure, how about newly introducing a kind of 'releasefn' for 'lock' into 'transaction' ? BTW, 'close'/'closing' are already used as a meaning of 'successful end of transaction' in many situations (e.g. name of hook). What name is suitable for the hook invoked at "the end of transaction scope" ? > >> I would be happy with wither 1-1 (weak-ref > >> please) or 1-2. With a preference for 1-2. > >> > >> You can make the repo argument optional, with a devel warning if it is > >> missing for example. To avoid too much breakage for a couple of version. > > > > BTW, what is your strong reason to choose 1-2 ? to avoid using > > weak ref of repo ? > > I do not have a strong opinion about 1-1 vs 1-2. > > 1-2 avoid keeping the cyclic reference in the object design. We use the > pattern of "providing a repo" when needed in multiple place. It also > help refactoring code to have the argument explicitly passed. because it > ensure function calling write have access to all necessary argument > (here repo). > > I think Matt have a strong opinion agains weakref. I see. > >>> 2. in-memory dirstate changes should be discarded at failure of > >>> transaction, to prevent them from being written out at (outer) > >>> 'wlock.release' or so > >>> > >>> I'm planning to do this via "the end of transaction" callback. > >>> > >>> > >>> Using 'dirstate.{begin|end}parentchange()' mechanism as below also > >>> works similarly. > >>> > >>> - invoke beginparentchange at starting transaction > >>> - invoke endparentchange at (successfully) closing transaction > >>> - wlock.release at failure invokes 'dirstate.invalidate()' > >>> > >>> But it seems too optimistic for outside transaction :-), and it > >>> can't ensure consistency just at the end of transaction. > >> > >> I'm a bit confused about why you would invoe dirstate.invalidate() at > >> lock release. Can't we have something invalidating the dirstate content > >> when transaction if rolledback (so that dirstate is read from disk? > > > > (at first, 'dirstate.{begin|end}parentchange()' example above is > > just a example to do similar thing in another way, but not what I > > want to do) > > > >> Do we have something forcing dirstate flushing at the transaction > >> openning? (so that the backup we restore is okay?) > > > > AFAIK, with current implementation: > > > > - in-memory dirstate changes aren't explicitly written out at opening > > transaction > > > > In almost all cases, in-memory dirstate changes seem to be > > INDIRECTLY written out before transaction opening. But it isn't > > explicitly ensured. > > > > This causes: > > > > 1. saving incomplete '.hg/dirstate' into '.hg/journal.dirstate' at > > opening transaction, and > > > > 2. restoring one from (incorrect) '.hg/undo.dirstate', which is > > renamed from '.hg/journal.dirstate' above, at rollback > > > > (Oops, I have overlooked that journal file is created not from > > dirstate instance but from '.hg/dirstate' file at opening > > transaction :-< we have to fix this at first) > > > > For example, at "hg backout --commit", dirstate changes via > > reverting aren't written out until 'ctx.markcommitted()' > > after closing the transaction. And, "hg status" after "hg > > rollback" shows unexpected status for added/removed files. > > > > BTW, after my previous change fe03f522dda9, this issue can be > > reproduced, only when 'repo.status()' doesn't cause 'normal()'-ing > > on some unsure files :-) > > > > https://selenic.com/repo/hg/rev/fe03f522dda9 > > It seems like we should never start a transaction with a dirty dirstate. > Is there any theoritical reason for not doing it? I don't know any theoretical reason for not doing it, and this issue was fixed by 800e090e9c64. https://selenic.com/repo/hg/rev/800e090e9c64 > > > > - in-memory dirstate changes aren't explicitly discarded at > > aborting transaction > > > > (for convenience, let's assume that '.hg/journal.dirstate' is > > correct, at this point) > > > > As you described, aborting transaction restores '.hg/dirstate' from > > '.hg/journal.dirstate' saved at opening transaction. > > > > But in-memory changes during transaction are still kept in > > 'dirstate' instance, and they may be written into already restored > > '.hg/dirstate' at 'wlock.release()' or so. > > This sounds wrong (but less keep moving) > > > > > BTW, commands almost fully enclosed by transaction treat "in-memory > > changes during aborted transaction" in ways below: > > > > - in-memory changes are discarded by 'dirstateguard' OUTSIDE > > transaction scope > > ('hg qpush', 'hg commit --amend', and 'hg import') > > > > "discarding in-memory dirstate changes at aborting > > transaction" will make dirstateguard in these cases > > useless. > > > > - in-memory changes should be discarded, but not > > ('hg transplant') > > I do not understand this sentence. This issue of transplant was fixed by 99e88320d665. https://selenic.com/repo/hg/rev/99e88320d665 > > Aborting for other than conflict may cause unexpected dirstate > > (e.g. referring rollbacked revision as parent). This should be > > fixed. > > > > - discarding in-memory changes causes problem > > ('hg shelve' and 'hg unshelve') > > > > Current shelve/unshelve implementation expects aborting > > transaction NOT to discard in-memory dirstate changes intentionally > > or unintentionally. > > > > 1. start transaction > > 2. do shelve/unshelve, and this makes temporary revisions > > 3. update working directory to the parent revision at (1) > > (+ some additional actions) > > - dirstate is different from one at (1), because > > some changes are shelved/unshelved > > 4. abort transaction intentionally > > - rollback revisions created at (2) > > - restore '.hg/dirstate' from one at (1) > > 5. write dirstate at 'wlock.release' > > - here, '.hg/dirstate' is equal to one at (3) > > > > IMHO, "strip by aborting current transaction" at (4) above > > should be replaced by "strip by rollbacking the transaction > > after once closing it". > > > > The latter doesn't strip the parent of the working directory at > > stripping because of updating at (3): this is non-"parentgone" > > case. Then, dirstate is kept as same as one at (3) above. > > Is there anything else than shelve relying on this. Having shelve > relying on a wrong/buggy behavior from core seems debatable. > It should not block us to fix core. AFAIK, only shelve relies on this wrong/buggy behavior. I confirmed that "explicitly rollbacking transaction just after committing it" can replace current problematic implementation in shelve. > > Let me confirm how dirstate should be treated at the border of > > transaction scope. > > > > - at opening transaction: > > > > - write in-memory dirstate changes into '.hg/dirstate' (not yet) > > > > - at successfully closing transaction: > > > > - write changes during transaction into '.hg/dirstate' (not yet) > > - rename from '.hg/journal.dirstate' to '.hg/undo.dirstate' (OK) > > > > - at aborting transaction: > > > > - restore '.hg/dirstate' from '.hg/journal.dirstat' (OK) > > - discard in-memory dirstate changes (not yet) > > This summary looks perfect. Any objection/issue with doing this (beside > shelve abuse?) > > >>>>> But current implementation doesn't invoke abort callbacks, if there is > >>>>> no change on repository data at aborting transaction: e.g. "hg import" > >>>>> fails to import the first patch. > >>>> > >>>> The change to dirstate should most probably register himself as change > >>>> to the transaction. That would trigger it. > >>> > >>> I found two problems of 'addfilegenerator()' below. IMHO, the former > >>> is a major problem, and the latter is minor one (but you should > >>> dislike it as later :-)). > >> > >> If addfilegenerator fails to fit our needs here, we should probably > >> hammer it so it does. > >> > >>> And, both of my fixing plans cause omitting invocation of abort > >>> callbacks, because they makes 'transaction._backupentries' empty :-< > >>> > >>> - current 'addfilegenerator()' causes forcibly restoring > >>> '.hg/dirstate' at rollback-ing, even if rollback-ing doesn't change > >>> parents of the working directory > >>> > >>> This causes unexpected "hg status" output at rollback-ing in such > >>> situation (aka non-"parentgone" case). > >> > >> Do you mean there is category of change to dirstate that must not be > >> rolledback if transaction fails? I do not see data about this on the > >> wikipage (but I could be blind) Can you elaborate? > >> > >> I'm not sure to understand the issue here. > > > > "rollback" above is not one by aborting while transaction running, but > > one by "hg rollback"/"repo.rollback()" after once closing transaction. > > > > In the later case, parents of the working directory may differ > > from one at opening transaction, and forcible restoring dirstate > > causes unexpected result of subsequent "hg status". > > > > Current 'addfilegenerator()' mechanism records '.hg/dirstate' as to be > > restored at rollback, and it causes forcible restoring. > > > > 'repo.rollback()' avoids this problem by restoring dirstate by itself. > > > > https://selenic.com/repo/hg/file/tip/mercurial/localrepo.py#l1110 I put the link again but use concrete changeset ID instead of 'tip', for referring safely in the future :-) https://selenic.com/repo/hg/file/a7527c5769bb/mercurial/localrepo.py#l1116 > I'm not sure about what this section (of the email) is about. Look like > the linked code is close to what we need to invalidate the dirstate in > memory content. But its late so I'm giving up with this section. I think > we made good enough progress on the other ones. 'journal.dirstate' (or 'undo.dirstate' renamed from it) is used to restore dirstate (1) at aborting transaction, and (2) at "hg rollback" after committing transaction. In the former case, restoring dirstate from 'journal.dirstate' is always correct (no explanation is needed, isn't it ? :-)) In the latter case, the working directory may be updated to the revision other than one(s) committed in previous transaction. If NOT so (case 2-1), dirstate should be restored like case (1), because current parent revision will go away after rollbacking. This situation is called as "parentgone". But otherwise (case 2-2), dirstate shouldn't be restored from 'undo.dirstate', because this restoring is equal to unexpected "hg debugsetparent" (+ breaking file status) for users. Here, current 'addfilegenerator()' implementation forcibly puts the specified file name into 'journal.backupfiles' (or 'undo.backupfiles' renamed from it) to restore changed files at aborting (or rollbacking) transaction. Then, using 'addfilegenerator()' always causes restoring dirstate via 'undo.backupfiles' at 'hg rollback' regardless of "parentgone" (2-1) or not (2-2), even though 'localrepo._rollback()' tries to avoid it in case (2-2). As described above, this is unexpected "hg debugsetparent" (+ breaking file status). > >>> I'm planning to introduce 'backup' optional argument, which controls > >>> 'addbackup()' invocation in '_generatefiles()' to avoid issue above, > >>> to 'addfilegenerator()'. > >>> > >>> https://selenic.com/repo/hg/file/tip/mercurial/transaction.py#l267 > >> > >> That sound wrong to me. We either have change to be apply on transaction > >> success and rolledback on transaction failure. But I can't see case were > >> backup will be wrong. > >> > >> > >>> According to discussion before, dirstate writes changes into (a) > >>> '.hg/dirstate' at the beginning of transaction and (b) > >>> '.hg/dirstate.pending' while transaction running. Then, discarding > >>> the latter can work as a kind of restoring. So, omitting > >>> 'addbackup()' invocation doesn't cause problem at least for dirstate. > >>> > >>> Or is another trick like "add to _backupentries, but omit backup > >>> itself" better ? > >> > >> The backup are also used for rollback (and hg recover). We must keep it. > >> > >>> > >>> - current 'addfilegenerator()' causes forcibly writing data out at > >>> 'writepending()', even if data hasn't been changed since last > >>> writing out > >>> > >>> When external hooks are invoked while transaction, this causes > >>> additional "transaction abort!" and "rollback completed" output, > >>> even though dirstate hasn't been changed since the beginning of > >>> transaction. > >>> > >>> For example, pushing changes/bookmarks causes writing into > >>> 'dirstate.pending' at REMOTE side, even though it never uses and > >>> changes dirstate. > >> > >> If there is no change to the dirstate. Why are we recording any change > >> into the transaction at all? > > > > I understood that 'addfilegenerator()' is used for files which may be > > changed during transaction like "journal", and invoked it for dirstate > > at opening transaction. > > > > Should I use it after examination whether dirstate has been changed > > since opening transaction ? > > The current semantic is: > > I've new content, here is how to generated it on disk. > > So it should be called by the code updating dirstate to "commit" the > change. It should replace (or be contained in) dirstate.write() calls. > > (we should probably update the docstring to make it clean) I see. > -- > Pierre-Yves David > ---------------------------------------------------------------------- [FUJIWARA Katsunori] foozy@lares.dti.ne.jp
At Thu, 13 Aug 2015 12:17:22 +0900, FUJIWARA Katsunori wrote: > > > At Sat, 18 Jul 2015 01:48:24 +0200, > Pierre-Yves David wrote: > > > > On 07/16/2015 04:43 PM, FUJIWARA Katsunori wrote: > > > > > > At Tue, 14 Jul 2015 13:26:14 +0200, > > > Pierre-Yves David wrote: > > >> > > >> On 07/13/2015 07:44 AM, FUJIWARA Katsunori wrote: > > >>> > > >>> > > >>> At Sat, 11 Jul 2015 20:47:00 +0100, > > >>> Pierre-Yves David wrote: > > >>> > > >>>> On 07/11/2015 11:31 AM, FUJIWARA Katsunori wrote: > > >>>>> Now, I'm working for dirstate consistency around inside and border of > > >>>>> transaction, and planning to use this abort callback mechanism to know > > >>>>> the end of transaction at failure. > > >>>> > > >>>> I'm a bit curious about what code you need to run in such case. If you > > >>>> use the file generator mechanism you do not need to worries about > > >>>> restoring backup. > > >>>> > > >>>> However, I'm not sure how rolling back the file would atomatically > > >>>> invalide the in memory object content (probably not). Is this why you > > >>>> need a callback? could we (would it make sense to) integrate that as > > >>>> part of one of our current cache invalidation mechanism? > > >>> > > >>> > > >>> I'm planning to use callback mechanism for purposes below: > > >>> > > >>> 1. to make existing 'dirstate.write()' invocations write in-memory > > >>> changes into appropriate file ('.hg/dirstate' or '.hg/dirstate.pending'), > > >>> 'dirstate.write()' should know whether transaction is running or not > > >>> > > >>> I'm planning changes below for this purpose: > > >>> > > >>> - invoke (newly added) 'dirstate.begintransaction()' at the > > >>> beginning of transaction in 'localrepo.transaction()' > > >>> > > >>> - invoke (newly added) 'dirstate.endtransaction()' at the end of > > >>> transaction via "the end of transaction" callback (instead of > > >>> finalize/abort callbacks) > > >>> > > >>> > > >>> There are some other ways: > > >>> > > >>> 1-1. instantiate dirstate with 'repo' > > >>> (or weak ref of repo to avoid circular reference ?) > > >>> > > >>> 1-2. make 'dirstate.write()' require 'localrepo' argument > > >>> > > >>> This seems too large impact (especially for 3rd party > > >>> extensions, even though Mercurial doesn't ensure compatibility > > >>> of internal API at all :-)) > > >>> > > >>> In both cases, 'dirstate.write()' can determine appropriate output > > >>> file by examination of 'repo.currenttransaction()'. > > >> > > >> I would prefer one of this options. Using hooks for such core code looks > > >> a bit hacky and fragile. > > > > > > How about something new like 'unlock' of 'lock' (or 'after' of > > > transaction) ? Would you think this kind of 'call back' mechanism > > > itself is also fragile for core code (if newly added) ? > > > > I do not understand what you mean. > > > > I'm worried about hook approach being "fragile" because it dirstate > > defines very important data for the understanding of the repo. Having it > > a the same place as other extensions and config hook will probably lead > > to some ordering hell. > > (Oops, 'unlock' is just a name of a local variable in ^^^^^^^^^^^^^^ > 'localrepo.lock()'. Correct name of constructor argument is > 'releasefn') "local variable" => "local function" ---------------------------------------------------------------------- [FUJIWARA Katsunori] foozy@lares.dti.ne.jp
On 08/12/2015 08:17 PM, FUJIWARA Katsunori wrote: > > At Sat, 18 Jul 2015 01:48:24 +0200, > Pierre-Yves David wrote: >> >> On 07/16/2015 04:43 PM, FUJIWARA Katsunori wrote: >>> >>> At Tue, 14 Jul 2015 13:26:14 +0200, >>> Pierre-Yves David wrote: >>>> >>>> On 07/13/2015 07:44 AM, FUJIWARA Katsunori wrote: >>>>> >>>>> >>>>> At Sat, 11 Jul 2015 20:47:00 +0100, >>>>> Pierre-Yves David wrote: >>>>> >>>>>> On 07/11/2015 11:31 AM, FUJIWARA Katsunori wrote: >>>>>>> Now, I'm working for dirstate consistency around inside and border of >>>>>>> transaction, and planning to use this abort callback mechanism to know >>>>>>> the end of transaction at failure. >>>>>> >>>>>> I'm a bit curious about what code you need to run in such case. If you >>>>>> use the file generator mechanism you do not need to worries about >>>>>> restoring backup. >>>>>> >>>>>> However, I'm not sure how rolling back the file would atomatically >>>>>> invalide the in memory object content (probably not). Is this why you >>>>>> need a callback? could we (would it make sense to) integrate that as >>>>>> part of one of our current cache invalidation mechanism? >>>>> >>>>> >>>>> I'm planning to use callback mechanism for purposes below: >>>>> >>>>> 1. to make existing 'dirstate.write()' invocations write in-memory >>>>> changes into appropriate file ('.hg/dirstate' or '.hg/dirstate.pending'), >>>>> 'dirstate.write()' should know whether transaction is running or not >>>>> >>>>> I'm planning changes below for this purpose: >>>>> >>>>> - invoke (newly added) 'dirstate.begintransaction()' at the >>>>> beginning of transaction in 'localrepo.transaction()' >>>>> >>>>> - invoke (newly added) 'dirstate.endtransaction()' at the end of >>>>> transaction via "the end of transaction" callback (instead of >>>>> finalize/abort callbacks) >>>>> >>>>> >>>>> There are some other ways: >>>>> >>>>> 1-1. instantiate dirstate with 'repo' >>>>> (or weak ref of repo to avoid circular reference ?) >>>>> >>>>> 1-2. make 'dirstate.write()' require 'localrepo' argument >>>>> >>>>> This seems too large impact (especially for 3rd party >>>>> extensions, even though Mercurial doesn't ensure compatibility >>>>> of internal API at all :-)) >>>>> >>>>> In both cases, 'dirstate.write()' can determine appropriate output >>>>> file by examination of 'repo.currenttransaction()'. >>>> >>>> I would prefer one of this options. Using hooks for such core code looks >>>> a bit hacky and fragile. >>> >>> How about something new like 'unlock' of 'lock' (or 'after' of >>> transaction) ? Would you think this kind of 'call back' mechanism >>> itself is also fragile for core code (if newly added) ? >> >> I do not understand what you mean. >> >> I'm worried about hook approach being "fragile" because it dirstate >> defines very important data for the understanding of the repo. Having it >> a the same place as other extensions and config hook will probably lead >> to some ordering hell. > > (Oops, 'unlock' is just a name of a local variable in > 'localrepo.lock()'. Correct name of constructor argument is > 'releasefn') > > As you said, using 'hook' may lead to some ordering hell by other > extensions and config hook. > > What we want is a kind of 'releasefn' callback for 'lock' class, which > is called at the end of lock scope via invocation of own 'release()' > (or via '__del__' of itself). > > 'transaction' constructor takes 'after' callback argument, but it is > called only when transaction is successfully closing. > > To provide safe notification mechanism, which is isolated from hook > infrastructure, how about newly introducing a kind of 'releasefn' for > 'lock' into 'transaction' ? > > BTW, 'close'/'closing' are already used as a meaning of 'successful > end of transaction' in many situations (e.g. name of hook). What name > is suitable for the hook invoked at "the end of transaction scope" ? I think "release" would be the right terms here. I'm however a bit lost about why we are diving so much into the hooking question. File generator should allow use to get most of what we need, doesn't it? >> It seems like we should never start a transaction with a dirty dirstate. >> Is there any theoritical reason for not doing it? > > I don't know any theoretical reason for not doing it, and this issue > was fixed by 800e090e9c64. > > https://selenic.com/repo/hg/rev/800e090e9c64 Good, let's enforce it then (or at least add a devel-warning. >> Is there anything else than shelve relying on this. Having shelve >> relying on a wrong/buggy behavior from core seems debatable. >> It should not block us to fix core. > > AFAIK, only shelve relies on this wrong/buggy behavior. > > I confirmed that "explicitly rollbacking transaction just after > committing it" can replace current problematic implementation in > shelve. Let's do that then. >>> Let me confirm how dirstate should be treated at the border of >>> transaction scope. >>> >>> - at opening transaction: >>> >>> - write in-memory dirstate changes into '.hg/dirstate' (not yet) >>> >>> - at successfully closing transaction: >>> >>> - write changes during transaction into '.hg/dirstate' (not yet) >>> - rename from '.hg/journal.dirstate' to '.hg/undo.dirstate' (OK) >>> >>> - at aborting transaction: >>> >>> - restore '.hg/dirstate' from '.hg/journal.dirstat' (OK) >>> - discard in-memory dirstate changes (not yet) >> >> This summary looks perfect. Any objection/issue with doing this (beside >> shelve abuse?) >> >>>>>>> But current implementation doesn't invoke abort callbacks, if there is >>>>>>> no change on repository data at aborting transaction: e.g. "hg import" >>>>>>> fails to import the first patch. >>>>>> >>>>>> The change to dirstate should most probably register himself as change >>>>>> to the transaction. That would trigger it. >>>>> >>>>> I found two problems of 'addfilegenerator()' below. IMHO, the former >>>>> is a major problem, and the latter is minor one (but you should >>>>> dislike it as later :-)). >>>> >>>> If addfilegenerator fails to fit our needs here, we should probably >>>> hammer it so it does. >>>> >>>>> And, both of my fixing plans cause omitting invocation of abort >>>>> callbacks, because they makes 'transaction._backupentries' empty :-< >>>>> >>>>> - current 'addfilegenerator()' causes forcibly restoring >>>>> '.hg/dirstate' at rollback-ing, even if rollback-ing doesn't change >>>>> parents of the working directory >>>>> >>>>> This causes unexpected "hg status" output at rollback-ing in such >>>>> situation (aka non-"parentgone" case). >>>> >>>> Do you mean there is category of change to dirstate that must not be >>>> rolledback if transaction fails? I do not see data about this on the >>>> wikipage (but I could be blind) Can you elaborate? >>>> >>>> I'm not sure to understand the issue here. >>> >>> "rollback" above is not one by aborting while transaction running, but >>> one by "hg rollback"/"repo.rollback()" after once closing transaction. >>> >>> In the later case, parents of the working directory may differ >>> from one at opening transaction, and forcible restoring dirstate >>> causes unexpected result of subsequent "hg status". >>> >>> Current 'addfilegenerator()' mechanism records '.hg/dirstate' as to be >>> restored at rollback, and it causes forcible restoring. >>> >>> 'repo.rollback()' avoids this problem by restoring dirstate by itself. >>> >>> https://selenic.com/repo/hg/file/tip/mercurial/localrepo.py#l1110 > > I put the link again but use concrete changeset ID instead of 'tip', > for referring safely in the future :-) > > https://selenic.com/repo/hg/file/a7527c5769bb/mercurial/localrepo.py#l1116 >> I'm not sure about what this section (of the email) is about. Look like >> the linked code is close to what we need to invalidate the dirstate in >> memory content. But its late so I'm giving up with this section. I think >> we made good enough progress on the other ones. > > 'journal.dirstate' (or 'undo.dirstate' renamed from it) is used to > restore dirstate (1) at aborting transaction, and (2) at "hg rollback" > after committing transaction. > In the former case, restoring dirstate from 'journal.dirstate' is > always correct (no explanation is needed, isn't it ? :-)) I would says so (file state may be a bit fun, but…) > In the latter case, the working directory may be updated to the > revision other than one(s) committed in previous transaction. > > If NOT so (case 2-1), dirstate should be restored like case (1), > because current parent revision will go away after rollbacking. This > situation is called as "parentgone". > > But otherwise (case 2-2), dirstate shouldn't be restored from > 'undo.dirstate', because this restoring is equal to unexpected "hg > debugsetparent" (+ breaking file status) for users. urg, I see. > Here, current 'addfilegenerator()' implementation forcibly puts the > specified file name into 'journal.backupfiles' (or 'undo.backupfiles' > renamed from it) to restore changed files at aborting (or rollbacking) > transaction. > > Then, using 'addfilegenerator()' always causes restoring dirstate via > 'undo.backupfiles' at 'hg rollback' regardless of "parentgone" (2-1) > or not (2-2), even though 'localrepo._rollback()' tries to avoid it in > case (2-2). As described above, this is unexpected "hg debugsetparent" > (+ breaking file status). I see, this is unfortunate. Is there other working copy specific file we should be careful about too? Thanks for the details!
At Mon, 17 Aug 2015 15:16:51 -0700, Pierre-Yves David wrote: > > On 08/12/2015 08:17 PM, FUJIWARA Katsunori wrote: > > > > At Sat, 18 Jul 2015 01:48:24 +0200, > > Pierre-Yves David wrote: > >> > >> On 07/16/2015 04:43 PM, FUJIWARA Katsunori wrote: > >>> > >>> At Tue, 14 Jul 2015 13:26:14 +0200, > >>> Pierre-Yves David wrote: > >>>> > >>>> On 07/13/2015 07:44 AM, FUJIWARA Katsunori wrote: > >>>>> > >>>>> At Sat, 11 Jul 2015 20:47:00 +0100, > >>>>> Pierre-Yves David wrote: > >>>>> > >>>>>> On 07/11/2015 11:31 AM, FUJIWARA Katsunori wrote: > >>>>>> > >>>>>>> Now, I'm working for dirstate consistency around inside and border of > >>>>>>> transaction, and planning to use this abort callback mechanism to know > >>>>>>> the end of transaction at failure. > >>>>>> > >>>>>> I'm a bit curious about what code you need to run in such case. If you > >>>>>> use the file generator mechanism you do not need to worries about > >>>>>> restoring backup. > >>>>>> > >>>>>> However, I'm not sure how rolling back the file would atomatically > >>>>>> invalide the in memory object content (probably not). Is this why you > >>>>>> need a callback? could we (would it make sense to) integrate that as > >>>>>> part of one of our current cache invalidation mechanism? > >>>>> > >>>>> > >>>>> I'm planning to use callback mechanism for purposes below: > >>>>> > >>>>> 1. to make existing 'dirstate.write()' invocations write in-memory > >>>>> changes into appropriate file ('.hg/dirstate' or '.hg/dirstate.pending'), > >>>>> 'dirstate.write()' should know whether transaction is running or not > >>>>> > >>>>> I'm planning changes below for this purpose: > >>>>> > >>>>> - invoke (newly added) 'dirstate.begintransaction()' at the > >>>>> beginning of transaction in 'localrepo.transaction()' > >>>>> > >>>>> - invoke (newly added) 'dirstate.endtransaction()' at the end of > >>>>> transaction via "the end of transaction" callback (instead of > >>>>> finalize/abort callbacks) > >>>>> > >>>>> > >>>>> There are some other ways: > >>>>> > >>>>> 1-1. instantiate dirstate with 'repo' > >>>>> (or weak ref of repo to avoid circular reference ?) > >>>>> > >>>>> 1-2. make 'dirstate.write()' require 'localrepo' argument > >>>>> > >>>>> This seems too large impact (especially for 3rd party > >>>>> extensions, even though Mercurial doesn't ensure compatibility > >>>>> of internal API at all :-)) > >>>>> > >>>>> In both cases, 'dirstate.write()' can determine appropriate output > >>>>> file by examination of 'repo.currenttransaction()'. > >>>> > >>>> I would prefer one of this options. Using hooks for such core code looks > >>>> a bit hacky and fragile. > >>> > >>> How about something new like 'unlock' of 'lock' (or 'after' of > >>> transaction) ? Would you think this kind of 'call back' mechanism > >>> itself is also fragile for core code (if newly added) ? > >> > >> I do not understand what you mean. > >> > >> I'm worried about hook approach being "fragile" because it dirstate > >> defines very important data for the understanding of the repo. Having it > >> a the same place as other extensions and config hook will probably lead > >> to some ordering hell. > > > > (Oops, 'unlock' is just a name of a local variable in > > 'localrepo.lock()'. Correct name of constructor argument is > > 'releasefn') > > > > As you said, using 'hook' may lead to some ordering hell by other > > extensions and config hook. > > > > What we want is a kind of 'releasefn' callback for 'lock' class, which > > is called at the end of lock scope via invocation of own 'release()' > > (or via '__del__' of itself). > > > > 'transaction' constructor takes 'after' callback argument, but it is > > called only when transaction is successfully closing. > > > > To provide safe notification mechanism, which is isolated from hook > > infrastructure, how about newly introducing a kind of 'releasefn' for > > 'lock' into 'transaction' ? > > > > BTW, 'close'/'closing' are already used as a meaning of 'successful > > end of transaction' in many situations (e.g. name of hook). What name > > is suitable for the hook invoked at "the end of transaction scope" ? > > I think "release" would be the right terms here. I'm however a bit lost > about why we are diving so much into the hooking question. File > generator should allow use to get most of what we need, doesn't it? We need "the end of transaction scope" callback to discarded in-memory dirstate changes at failure, because: - '.hg/dirstate' file itself is restored from '.hg/journal.dirstate' at failure of transaction as expected, but - in-memory dirstate changes aren't discarded even at failure of transaction, therefore - outer 'wlock.release()' causes writing such harmful in-memory changes out (in this case, in-memory changes may refer already stripped revisions) In some cases, 'dirstateguard' avoids such unexpected writing dirstate changes out outside transaction scope, but this should be done inside transaction scope, IMHO. > >> I'm not sure about what this section (of the email) is about. Look like > >> the linked code is close to what we need to invalidate the dirstate in > >> memory content. But its late so I'm giving up with this section. I think > >> we made good enough progress on the other ones. > > > > 'journal.dirstate' (or 'undo.dirstate' renamed from it) is used to > > restore dirstate (1) at aborting transaction, and (2) at "hg rollback" > > after committing transaction. > > > > In the former case, restoring dirstate from 'journal.dirstate' is > > always correct (no explanation is needed, isn't it ? :-)) > > I would says so (file state may be a bit fun, but…) > > > > In the latter case, the working directory may be updated to the > > revision other than one(s) committed in previous transaction. > > > > If NOT so (case 2-1), dirstate should be restored like case (1), > > because current parent revision will go away after rollbacking. This > > situation is called as "parentgone". > > > > But otherwise (case 2-2), dirstate shouldn't be restored from > > 'undo.dirstate', because this restoring is equal to unexpected "hg > > debugsetparent" (+ breaking file status) for users. > > urg, I see. > > > Here, current 'addfilegenerator()' implementation forcibly puts the > > specified file name into 'journal.backupfiles' (or 'undo.backupfiles' > > renamed from it) to restore changed files at aborting (or rollbacking) > > transaction. > > > > Then, using 'addfilegenerator()' always causes restoring dirstate via > > 'undo.backupfiles' at 'hg rollback' regardless of "parentgone" (2-1) > > or not (2-2), even though 'localrepo._rollback()' tries to avoid it in > > case (2-2). As described above, this is unexpected "hg debugsetparent" > > (+ breaking file status). > > I see, this is unfortunate. Is there other working copy specific file we > should be careful about too? AFAIK, there is no other one like that: there are some other working copy sensitive files '.hg/branch' and '.hg/bookmarks.current', but they aren't saved/restored by transaction. > Thanks for the details! > > -- > Pierre-Yves David > ---------------------------------------------------------------------- [FUJIWARA Katsunori] foozy@lares.dti.ne.jp
Patch
diff --git a/mercurial/transaction.py b/mercurial/transaction.py --- a/mercurial/transaction.py +++ b/mercurial/transaction.py @@ -135,8 +135,10 @@ class transaction(object): # holds callback to call when writing the transaction self._finalizecallback = {} # hold callback for post transaction close self._postclosecallback = {} + # holds callbacks to call during abort + self._abortcallback = {} def __del__(self): if self.journal: self._abort() @@ -360,8 +362,19 @@ class transaction(object): """ self._postclosecallback[category] = callback @active + def addabort(self, category, callback): + """add a callback to be called when the transaction is aborted. + + The transaction will be given as the first argument to the callback. + + Category is a unique identifier to allow overwriting an old callback + with a newer callback. + """ + self._abortcallback[category] = callback + + @active def close(self): '''commit the transaction''' if self.count == 1: self._generatefiles() @@ -442,8 +455,10 @@ class transaction(object): self.report(_("transaction abort!\n")) try: + for cat in sorted(self._abortcallback): + self._abortcallback[cat](self) _playback(self.journal, self.report, self.opener, self._vfsmap, self.entries, self._backupentries, False) self.report(_("rollback completed\n")) except Exception: