Patchwork transaction: support for callbacks during abort

login
register
mail settings
Submitter Gregory Szorc
Date Jan. 7, 2015, 6 a.m.
Message ID <5affbb31eea2a1034bc7.1420610418@3.1.168.192.in-addr.arpa>
Download mbox | patch
Permalink /patch/7346/
State Accepted
Commit d486e52352e80426db6f3d738e35bb675b3c3fa2
Headers show

Comments

Gregory Szorc - Jan. 7, 2015, 6 a.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1420610193 28800
#      Tue Jan 06 21:56:33 2015 -0800
# Node ID 5affbb31eea2a1034bc73ee20182880adcc9af36
# Parent  b9d06fa10ef29c012d48bd4f3c93fd7bf1347d40
transaction: support for callbacks during abort

Previous transaction work added callbacks to be called during regular
transaction commit/close. As part of refactoring Mozilla's pushlog
extension (an extension that opens a SQLite database and tries to tie
its transaction semantics to Mercurial's transaction), I discovered that
the new transaction APIs were insufficient to avoid monkeypatching
transaction instance internals. Adding a callback that is called during
transaction abort removes the necessity for monkeypatching and completes
the API.
Matt Mackall - Jan. 8, 2015, 6:21 p.m.
On Tue, 2015-01-06 at 22:00 -0800, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1420610193 28800
> #      Tue Jan 06 21:56:33 2015 -0800
> # Node ID 5affbb31eea2a1034bc73ee20182880adcc9af36
> # Parent  b9d06fa10ef29c012d48bd4f3c93fd7bf1347d40
> transaction: support for callbacks during abort

This is queued for default, thanks.
Pierre-Yves David - Jan. 8, 2015, 10:55 p.m.
On 01/08/2015 10:21 AM, Matt Mackall wrote:
> On Tue, 2015-01-06 at 22:00 -0800, Gregory Szorc wrote:
>> # HG changeset patch
>> # User Gregory Szorc <gregory.szorc@gmail.com>
>> # Date 1420610193 28800
>> #      Tue Jan 06 21:56:33 2015 -0800
>> # Node ID 5affbb31eea2a1034bc73ee20182880adcc9af36
>> # Parent  b9d06fa10ef29c012d48bd4f3c93fd7bf1347d40
>> transaction: support for callbacks during abort
>
> This is queued for default, thanks.

The patch is good, but I recommend setting some was to have to run 
somewhere in the test suite if you want it to stay unbroken.
Katsunori FUJIWARA - July 11, 2015, 10:31 a.m.
At Tue, 06 Jan 2015 22:00:18 -0800,
Gregory Szorc wrote:
> 
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1420610193 28800
> #      Tue Jan 06 21:56:33 2015 -0800
> # Node ID 5affbb31eea2a1034bc73ee20182880adcc9af36
> # Parent  b9d06fa10ef29c012d48bd4f3c93fd7bf1347d40
> transaction: support for callbacks during abort
> 
> Previous transaction work added callbacks to be called during regular
> transaction commit/close. As part of refactoring Mozilla's pushlog
> extension (an extension that opens a SQLite database and tries to tie
> its transaction semantics to Mercurial's transaction), I discovered that
> the new transaction APIs were insufficient to avoid monkeypatching
> transaction instance internals. Adding a callback that is called during
> transaction abort removes the necessity for monkeypatching and completes
> the API.
> 
> diff --git a/mercurial/transaction.py b/mercurial/transaction.py
> --- a/mercurial/transaction.py
> +++ b/mercurial/transaction.py

(snip)

> @@ -442,8 +455,10 @@ class transaction(object):
>  
>              self.report(_("transaction abort!\n"))
>  
>              try:
> +                for cat in sorted(self._abortcallback):
> +                    self._abortcallback[cat](self)
>                  _playback(self.journal, self.report, self.opener, self._vfsmap,
>                            self.entries, self._backupentries, False)
>                  self.report(_("rollback completed\n"))
>              except Exception:

(also CC-ed to Pierre-Yves as a implementer of 'txnabort' hook)

Now, I'm working for dirstate consistency around inside and border of
transaction, and planning to use this abort callback mechanism to know
the end of transaction at failure.

But current implementation doesn't invoke abort callbacks, if there is
no change on repository data at aborting transaction: e.g. "hg import"
fails to import the first patch.

So, there is no easy and certain way to know the end of transaction at
failure.

BTW, this also causes that 'txnabort' hook isn't invoked at aborting
transaction in such case, because 'txnabort' hook invocation is
achieved by abort callback mechanism. This may need some cleanup for
orphan 'pretxnopen' hook assuming that one of 'txnclose' or 'txnabort'
will be invoked subsequently.

Then, please let me confirm some points below before changing around
'transaction._abort()' for my purpose:

  - are there any usecases to expect actual changes to be rollbacked ?

    In other words, does it cause problem to invoke abort callbacks
    regardless of actual changes ?

  - are there any usecases to refer changes to be rollbacked in abort
    callbacks ?

    In other words, does it cause problem to invoke abort callbacks
    after rollbacking ?

    BTW, 'txnabort' hooks are invoked without 'pending', and it means
    that such changes should be invisible to 'txnabort' hooks, doesn't
    it ? > Pierre-Yves

    Then, IMHO, such changes should be invisible to abort callbacks,
    too.

Maybe, should I introduce another callback to know the end of
transaction (regardless of the result of it, or only at failure) ?


----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Gregory Szorc - July 11, 2015, 6:06 p.m.
On Sat, Jul 11, 2015 at 3:31 AM, FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
wrote:

> At Tue, 06 Jan 2015 22:00:18 -0800,
> Gregory Szorc wrote:
> >
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc@gmail.com>
> > # Date 1420610193 28800
> > #      Tue Jan 06 21:56:33 2015 -0800
> > # Node ID 5affbb31eea2a1034bc73ee20182880adcc9af36
> > # Parent  b9d06fa10ef29c012d48bd4f3c93fd7bf1347d40
> > transaction: support for callbacks during abort
> >
> > Previous transaction work added callbacks to be called during regular
> > transaction commit/close. As part of refactoring Mozilla's pushlog
> > extension (an extension that opens a SQLite database and tries to tie
> > its transaction semantics to Mercurial's transaction), I discovered that
> > the new transaction APIs were insufficient to avoid monkeypatching
> > transaction instance internals. Adding a callback that is called during
> > transaction abort removes the necessity for monkeypatching and completes
> > the API.
> >
> > diff --git a/mercurial/transaction.py b/mercurial/transaction.py
> > --- a/mercurial/transaction.py
> > +++ b/mercurial/transaction.py
>
> (snip)
>
> > @@ -442,8 +455,10 @@ class transaction(object):
> >
> >              self.report(_("transaction abort!\n"))
> >
> >              try:
> > +                for cat in sorted(self._abortcallback):
> > +                    self._abortcallback[cat](self)
> >                  _playback(self.journal, self.report, self.opener,
> self._vfsmap,
> >                            self.entries, self._backupentries, False)
> >                  self.report(_("rollback completed\n"))
> >              except Exception:
>
> (also CC-ed to Pierre-Yves as a implementer of 'txnabort' hook)
>
> Now, I'm working for dirstate consistency around inside and border of
> transaction, and planning to use this abort callback mechanism to know
> the end of transaction at failure.
>
> But current implementation doesn't invoke abort callbacks, if there is
> no change on repository data at aborting transaction: e.g. "hg import"
> fails to import the first patch.
>
> So, there is no easy and certain way to know the end of transaction at
> failure.
>
> BTW, this also causes that 'txnabort' hook isn't invoked at aborting
> transaction in such case, because 'txnabort' hook invocation is
> achieved by abort callback mechanism. This may need some cleanup for
> orphan 'pretxnopen' hook assuming that one of 'txnclose' or 'txnabort'
> will be invoked subsequently.
>
> Then, please let me confirm some points below before changing around
> 'transaction._abort()' for my purpose:
>
>   - are there any usecases to expect actual changes to be rollbacked ?
>
>     In other words, does it cause problem to invoke abort callbacks
>     regardless of actual changes ?
>

Mozilla has a use case in our pushlog extension (
https://hg.mozilla.org/hgcustom/version-control-tools/file/45f5a094da62/hgext/pushlog/__init__.py#l221).
Not sure if there is anything in core. Possibly not.


>
>   - are there any usecases to refer changes to be rollbacked in abort
>     callbacks ?
>
>     In other words, does it cause problem to invoke abort callbacks
>     after rollbacking ?
>

I don't think it matters if registered rollbacks are called before or after
core rollbacks. But, I think it is important that the "rollback completed"
message is printed after all rollbacks are performed. Otherwise, you could
see weird output. e.g.

  adding changesets
  adding manifests
  adding file changes
  added 1 changesets with 1 changes to 1 files
  transaction abort!
  rollback completed
  rolling back message from extension


>     BTW, 'txnabort' hooks are invoked without 'pending', and it means
>     that such changes should be invisible to 'txnabort' hooks, doesn't
>     it ? > Pierre-Yves
>
>     Then, IMHO, such changes should be invisible to abort callbacks,
>     too.
>
> Maybe, should I introduce another callback to know the end of
> transaction (regardless of the result of it, or only at failure) ?
>

I think that would be a nice addition to the API and would round out the
callback support nicely. This could be used for better forensic logging,
for example.
Pierre-Yves David - July 11, 2015, 7:47 p.m.
On 07/11/2015 11:31 AM, FUJIWARA Katsunori wrote:
> At Tue, 06 Jan 2015 22:00:18 -0800,
> Gregory Szorc wrote:
>>
>> # HG changeset patch
>> # User Gregory Szorc <gregory.szorc@gmail.com>
>> # Date 1420610193 28800
>> #      Tue Jan 06 21:56:33 2015 -0800
>> # Node ID 5affbb31eea2a1034bc73ee20182880adcc9af36
>> # Parent  b9d06fa10ef29c012d48bd4f3c93fd7bf1347d40
>> transaction: support for callbacks during abort
>>
>> Previous transaction work added callbacks to be called during regular
>> transaction commit/close. As part of refactoring Mozilla's pushlog
>> extension (an extension that opens a SQLite database and tries to tie
>> its transaction semantics to Mercurial's transaction), I discovered that
>> the new transaction APIs were insufficient to avoid monkeypatching
>> transaction instance internals. Adding a callback that is called during
>> transaction abort removes the necessity for monkeypatching and completes
>> the API.
>>
>> diff --git a/mercurial/transaction.py b/mercurial/transaction.py
>> --- a/mercurial/transaction.py
>> +++ b/mercurial/transaction.py
>
> (snip)
>
>> @@ -442,8 +455,10 @@ class transaction(object):
>>
>>               self.report(_("transaction abort!\n"))
>>
>>               try:
>> +                for cat in sorted(self._abortcallback):
>> +                    self._abortcallback[cat](self)
>>                   _playback(self.journal, self.report, self.opener, self._vfsmap,
>>                             self.entries, self._backupentries, False)
>>                   self.report(_("rollback completed\n"))
>>               except Exception:
>
> (also CC-ed to Pierre-Yves as a implementer of 'txnabort' hook)
>
> Now, I'm working for dirstate consistency around inside and border of
> transaction, and planning to use this abort callback mechanism to know
> the end of transaction at failure.

I'm a bit curious about what code you need to run in such case. If you 
use the file generator mechanism you do not need to worries about 
restoring backup.

However, I'm not sure how rolling back the file would atomatically 
invalide the in memory object content (probably not). Is this why you 
need a callback? could we (would it make sense to) integrate that as 
part of one of our current cache invalidation mechanism?

> But current implementation doesn't invoke abort callbacks, if there is
> no change on repository data at aborting transaction: e.g. "hg import"
> fails to import the first patch.

The change to dirstate should most probably register himself as change 
to the transaction. That would trigger it.

> So, there is no easy and certain way to know the end of transaction at
> failure.
>
> BTW, this also causes that 'txnabort' hook isn't invoked at aborting
> transaction in such case, because 'txnabort' hook invocation is
> achieved by abort callback mechanism. This may need some cleanup for
> orphan 'pretxnopen' hook assuming that one of 'txnclose' or 'txnabort'
> will be invoked subsequently.
>
> Then, please let me confirm some points below before changing around
> 'transaction._abort()' for my purpose:
>
>    - are there any usecases to expect actual changes to be rollbacked ?
>
>      In other words, does it cause problem to invoke abort callbacks
>      regardless of actual changes ?

If I remember correctly, it make "transaction abort" "abort complete 
message to happears in a lot of new place. Basically any place where we 
open a transaction but eventually do nothing. We can probably not change it

>    - are there any usecases to refer changes to be rollbacked in abort
>      callbacks ?
>
>      In other words, does it cause problem to invoke abort callbacks
>      after rollbacking ?

I would says yes, but detail could be complicated (to be continued)

>
>      BTW, 'txnabort' hooks are invoked without 'pending', and it means
>      that such changes should be invisible to 'txnabort' hooks, doesn't
>      it ? > Pierre-Yves

And here we have proof of (1) people do not rely of it too much (2) 
definition of the state of the transaciton during abort is far from trivial.
We should probably keep it with rolled back content/

>
>      Then, IMHO, such changes should be invisible to abort callbacks,
>      too.
>
> Maybe, should I introduce another callback to know the end of
> transaction (regardless of the result of it, or only at failure) ?

Sounds like an awesome idea. Python have:

try:
     .... # equivalent to transaction open
except:
     .... # equivalent to transaction abort
else:
     .... # equivalent to transaction success
finally:
     .... # equivalent of your proposal

Finally is definitly very useful, so go for it.
Katsunori FUJIWARA - July 13, 2015, 6:44 a.m.
At Sat, 11 Jul 2015 20:47:00 +0100,
Pierre-Yves David wrote:

> On 07/11/2015 11:31 AM, FUJIWARA Katsunori wrote:
> > At Tue, 06 Jan 2015 22:00:18 -0800,
> > Gregory Szorc wrote:
> >>
> >> # HG changeset patch
> >> # User Gregory Szorc <gregory.szorc@gmail.com>
> >> # Date 1420610193 28800
> >> #      Tue Jan 06 21:56:33 2015 -0800
> >> # Node ID 5affbb31eea2a1034bc73ee20182880adcc9af36
> >> # Parent  b9d06fa10ef29c012d48bd4f3c93fd7bf1347d40
> >> transaction: support for callbacks during abort
> >>
> >> Previous transaction work added callbacks to be called during regular
> >> transaction commit/close. As part of refactoring Mozilla's pushlog
> >> extension (an extension that opens a SQLite database and tries to tie
> >> its transaction semantics to Mercurial's transaction), I discovered that
> >> the new transaction APIs were insufficient to avoid monkeypatching
> >> transaction instance internals. Adding a callback that is called during
> >> transaction abort removes the necessity for monkeypatching and completes
> >> the API.
> >>
> >> diff --git a/mercurial/transaction.py b/mercurial/transaction.py
> >> --- a/mercurial/transaction.py
> >> +++ b/mercurial/transaction.py
> >
> > (snip)
> >
> >> @@ -442,8 +455,10 @@ class transaction(object):
> >>
> >>               self.report(_("transaction abort!\n"))
> >>
> >>               try:
> >> +                for cat in sorted(self._abortcallback):
> >> +                    self._abortcallback[cat](self)
> >>                   _playback(self.journal, self.report, self.opener, self._vfsmap,
> >>                             self.entries, self._backupentries, False)
> >>                   self.report(_("rollback completed\n"))
> >>               except Exception:
> >
> > (also CC-ed to Pierre-Yves as a implementer of 'txnabort' hook)

Thank you for your comments, Gregory and Pierre-Yves.

I'll work to add another callback to know the end of transaction, as
described later. So, some of comments below are just for your
information.


> > Now, I'm working for dirstate consistency around inside and border of
> > transaction, and planning to use this abort callback mechanism to know
> > the end of transaction at failure.
> 
> I'm a bit curious about what code you need to run in such case. If you 
> use the file generator mechanism you do not need to worries about 
> restoring backup.
> 
> However, I'm not sure how rolling back the file would atomatically 
> invalide the in memory object content (probably not). Is this why you 
> need a callback? could we (would it make sense to) integrate that as 
> part of one of our current cache invalidation mechanism?


I'm planning to use callback mechanism for purposes below:

1. to make existing 'dirstate.write()' invocations write in-memory
   changes into appropriate file ('.hg/dirstate' or '.hg/dirstate.pending'),
   'dirstate.write()' should know whether transaction is running or not

   I'm planning changes below for this purpose:

   - invoke (newly added) 'dirstate.begintransaction()' at the
     beginning of transaction in 'localrepo.transaction()'

   - invoke (newly added) 'dirstate.endtransaction()' at the end of
     transaction via "the end of transaction" callback (instead of
     finalize/abort callbacks)


   There are some other ways:

   1-1. instantiate dirstate with 'repo'
        (or weak ref of repo to avoid circular reference ?)

   1-2. make 'dirstate.write()' require 'localrepo' argument

        This seems too large impact (especially for 3rd party
        extensions, even though Mercurial doesn't ensure compatibility
        of internal API at all :-))

   In both cases, 'dirstate.write()' can determine appropriate output
   file by examination of 'repo.currenttransaction()'.


2. in-memory dirstate changes should be discarded at failure of
   transaction, to prevent them from being written out at (outer)
   'wlock.release' or so

   I'm planning to do this via "the end of transaction" callback.


   Using 'dirstate.{begin|end}parentchange()' mechanism as below also
   works similarly.

   - invoke beginparentchange at starting transaction
   - invoke endparentchange at (successfully) closing transaction
   - wlock.release at failure invokes 'dirstate.invalidate()'

   But it seems too optimistic for outside transaction :-), and it
   can't ensure consistency just at the end of transaction.



> > But current implementation doesn't invoke abort callbacks, if there is
> > no change on repository data at aborting transaction: e.g. "hg import"
> > fails to import the first patch.
> 
> The change to dirstate should most probably register himself as change 
> to the transaction. That would trigger it.

I found two problems of 'addfilegenerator()' below. IMHO, the former
is a major problem, and the latter is minor one (but you should
dislike it as later :-)).

And, both of my fixing plans cause omitting invocation of abort
callbacks, because they makes 'transaction._backupentries' empty :-<

- current 'addfilegenerator()' causes forcibly restoring
  '.hg/dirstate' at rollback-ing, even if rollback-ing doesn't change
  parents of the working directory

  This causes unexpected "hg status" output at rollback-ing in such
  situation (aka non-"parentgone" case).

  I'm planning to introduce 'backup' optional argument, which controls
  'addbackup()' invocation in '_generatefiles()' to avoid issue above,
  to 'addfilegenerator()'.

    https://selenic.com/repo/hg/file/tip/mercurial/transaction.py#l267

  According to discussion before, dirstate writes changes into (a)
  '.hg/dirstate' at the beginning of transaction and (b)
  '.hg/dirstate.pending' while transaction running. Then, discarding
  the latter can work as a kind of restoring. So, omitting
  'addbackup()' invocation doesn't cause problem at least for dirstate.

  Or is another trick like "add to _backupentries, but omit backup
  itself" better ?

- current 'addfilegenerator()' causes forcibly writing data out at
  'writepending()', even if data hasn't been changed since last
  writing out

  When external hooks are invoked while transaction, this causes
  additional "transaction abort!" and "rollback completed" output,
  even though dirstate hasn't been changed since the beginning of
  transaction.

  For example, pushing changes/bookmarks causes writing into
  'dirstate.pending' at REMOTE side, even though it never uses and
  changes dirstate.

  I'm planning to introduce 'checkdirty' optional argument, which is
  used to examine whether data has been changed since last writing
  out (= 'genfunc' should be invoked or not), to avoid redundant
  writing data out.


> > So, there is no easy and certain way to know the end of transaction at
> > failure.
> >
> > BTW, this also causes that 'txnabort' hook isn't invoked at aborting
> > transaction in such case, because 'txnabort' hook invocation is
> > achieved by abort callback mechanism. This may need some cleanup for
> > orphan 'pretxnopen' hook assuming that one of 'txnclose' or 'txnabort'
> > will be invoked subsequently.
> >
> > Then, please let me confirm some points below before changing around
> > 'transaction._abort()' for my purpose:
> >
> >    - are there any usecases to expect actual changes to be rollbacked ?
> >
> >      In other words, does it cause problem to invoke abort callbacks
> >      regardless of actual changes ?
> 
> If I remember correctly, it make "transaction abort" "abort complete 
> message to happears in a lot of new place. Basically any place where we 
> open a transaction but eventually do nothing. We can probably not change it

Adding invocation of abort callbacks just before cleaning journal
files up can avoid such additional messages, even though duplicated
code paths are redundant :-)

    https://selenic.com/repo/hg/file/tip/mercurial/transaction.py#l477

    diff --git a/mercurial/transaction.py b/mercurial/transaction.py
    --- a/mercurial/transaction.py
    +++ b/mercurial/transaction.py
    @@ -482,6 +482,8 @@ class transaction(object):
     
             try:
                 if not self.entries and not self._backupentries:
    +                for cat in sorted(self._abortcallback):
    +                    self._abortcallback[cat](self)
                     if self.journal:
                         self.opener.unlink(self.journal)
                     if self._backupjournal:


> >    - are there any usecases to refer changes to be rollbacked in abort
> >      callbacks ?
> >
> >      In other words, does it cause problem to invoke abort callbacks
> >      after rollbacking ?
> 
> I would says yes, but detail could be complicated (to be continued)
> 
> >
> >      BTW, 'txnabort' hooks are invoked without 'pending', and it means
> >      that such changes should be invisible to 'txnabort' hooks, doesn't
> >      it ? > Pierre-Yves
> 
> And here we have proof of (1) people do not rely of it too much (2) 
> definition of the state of the transaciton during abort is far from trivial.
> We should probably keep it with rolled back content/

It means that 'txnabort' hooks should be invoked with 'pending' (=
rolled back content should be visible to them), doen't it ? (sorry for
my poor English, if I misunderstood :-<)


> >
> >      Then, IMHO, such changes should be invisible to abort callbacks,
> >      too.
> >
> > Maybe, should I introduce another callback to know the end of
> > transaction (regardless of the result of it, or only at failure) ?
> 
> Sounds like an awesome idea. Python have:
> 
> try:
>      .... # equivalent to transaction open
> except:
>      .... # equivalent to transaction abort
> else:
>      .... # equivalent to transaction success
> finally:
>      .... # equivalent of your proposal
> 
> Finally is definitly very useful, so go for it.

I'll do so.


> -- 
> Pierre-Yves David
> 
> 
> 
> 
> 
> 
> >
> >
> > ----------------------------------------------------------------------
> > [FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
> >
> 
> -- 
> Pierre-Yves David
> 

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Pierre-Yves David - July 14, 2015, 11:26 a.m.
On 07/13/2015 07:44 AM, FUJIWARA Katsunori wrote:
>
>
> At Sat, 11 Jul 2015 20:47:00 +0100,
> Pierre-Yves David wrote:
>
>> On 07/11/2015 11:31 AM, FUJIWARA Katsunori wrote:
>>> Now, I'm working for dirstate consistency around inside and border of
>>> transaction, and planning to use this abort callback mechanism to know
>>> the end of transaction at failure.
>>
>> I'm a bit curious about what code you need to run in such case. If you
>> use the file generator mechanism you do not need to worries about
>> restoring backup.
>>
>> However, I'm not sure how rolling back the file would atomatically
>> invalide the in memory object content (probably not). Is this why you
>> need a callback? could we (would it make sense to) integrate that as
>> part of one of our current cache invalidation mechanism?
>
>
> I'm planning to use callback mechanism for purposes below:
>
> 1. to make existing 'dirstate.write()' invocations write in-memory
>     changes into appropriate file ('.hg/dirstate' or '.hg/dirstate.pending'),
>     'dirstate.write()' should know whether transaction is running or not
>
>     I'm planning changes below for this purpose:
>
>     - invoke (newly added) 'dirstate.begintransaction()' at the
>       beginning of transaction in 'localrepo.transaction()'
>
>     - invoke (newly added) 'dirstate.endtransaction()' at the end of
>       transaction via "the end of transaction" callback (instead of
>       finalize/abort callbacks)
>
>
>     There are some other ways:
>
>     1-1. instantiate dirstate with 'repo'
>          (or weak ref of repo to avoid circular reference ?)
>
>     1-2. make 'dirstate.write()' require 'localrepo' argument
>
>          This seems too large impact (especially for 3rd party
>          extensions, even though Mercurial doesn't ensure compatibility
>          of internal API at all :-))
>
>     In both cases, 'dirstate.write()' can determine appropriate output
>     file by examination of 'repo.currenttransaction()'.

I would prefer one of this options. Using hooks for such core code looks 
a bit hacky and fragile. I would be happy with wither 1-1 (weak-ref 
please) or 1-2. With a preference for 1-2.

You can make the repo argument optional, with a devel warning if it is 
missing for example. To avoid too much breakage for a couple of version.

> 2. in-memory dirstate changes should be discarded at failure of
>     transaction, to prevent them from being written out at (outer)
>     'wlock.release' or so
>
>     I'm planning to do this via "the end of transaction" callback.
>
>
>     Using 'dirstate.{begin|end}parentchange()' mechanism as below also
>     works similarly.
>
>     - invoke beginparentchange at starting transaction
>     - invoke endparentchange at (successfully) closing transaction
>     - wlock.release at failure invokes 'dirstate.invalidate()'
>
>     But it seems too optimistic for outside transaction :-), and it
>     can't ensure consistency just at the end of transaction.

I'm a bit confused about why you would invoe dirstate.invalidate() at 
lock release. Can't we have something invalidating the dirstate content 
when transaction if rolledback (so that dirstate is read from disk?

Do we have something forcing dirstate flushing at the transaction 
openning? (so that the backup we restore is okay?)

>>> But current implementation doesn't invoke abort callbacks, if there is
>>> no change on repository data at aborting transaction: e.g. "hg import"
>>> fails to import the first patch.
>>
>> The change to dirstate should most probably register himself as change
>> to the transaction. That would trigger it.
>
> I found two problems of 'addfilegenerator()' below. IMHO, the former
> is a major problem, and the latter is minor one (but you should
> dislike it as later :-)).

If addfilegenerator fails to fit our needs here, we should probably 
hammer it so it does.

> And, both of my fixing plans cause omitting invocation of abort
> callbacks, because they makes 'transaction._backupentries' empty :-<
>
> - current 'addfilegenerator()' causes forcibly restoring
>    '.hg/dirstate' at rollback-ing, even if rollback-ing doesn't change
>    parents of the working directory
>
>    This causes unexpected "hg status" output at rollback-ing in such
>    situation (aka non-"parentgone" case).

Do you mean there is category of change to dirstate that must not be 
rolledback if transaction fails? I do not see data about this on the 
wikipage (but I could be blind) Can you elaborate?

I'm not sure to understand the issue here.

>    I'm planning to introduce 'backup' optional argument, which controls
>    'addbackup()' invocation in '_generatefiles()' to avoid issue above,
>    to 'addfilegenerator()'.
>
>      https://selenic.com/repo/hg/file/tip/mercurial/transaction.py#l267

That sound wrong to me. We either have change to be apply on transaction 
success and rolledback on transaction failure. But I can't see case were 
backup will be wrong.


>    According to discussion before, dirstate writes changes into (a)
>    '.hg/dirstate' at the beginning of transaction and (b)
>    '.hg/dirstate.pending' while transaction running. Then, discarding
>    the latter can work as a kind of restoring. So, omitting
>    'addbackup()' invocation doesn't cause problem at least for dirstate.
>
>    Or is another trick like "add to _backupentries, but omit backup
>    itself" better ?

The backup are also used for rollback (and hg recover). We must keep it.

>
> - current 'addfilegenerator()' causes forcibly writing data out at
>    'writepending()', even if data hasn't been changed since last
>    writing out
>
>    When external hooks are invoked while transaction, this causes
>    additional "transaction abort!" and "rollback completed" output,
>    even though dirstate hasn't been changed since the beginning of
>    transaction.
>
>    For example, pushing changes/bookmarks causes writing into
>    'dirstate.pending' at REMOTE side, even though it never uses and
>    changes dirstate.

If there is no change to the dirstate. Why are we recording any change 
into the transaction at all?

>    I'm planning to introduce 'checkdirty' optional argument, which is
>    used to examine whether data has been changed since last writing
>    out (= 'genfunc' should be invoked or not), to avoid redundant
>    writing data out.

Yes, this way of detecting dirtyness have been bothering me in the past. 
I've a small series to clean this us, let me undust it and patchbomb it.

>>> So, there is no easy and certain way to know the end of transaction at
>>> failure.
>>>
>>> BTW, this also causes that 'txnabort' hook isn't invoked at aborting
>>> transaction in such case, because 'txnabort' hook invocation is
>>> achieved by abort callback mechanism. This may need some cleanup for
>>> orphan 'pretxnopen' hook assuming that one of 'txnclose' or 'txnabort'
>>> will be invoked subsequently.
>>>
>>> Then, please let me confirm some points below before changing around
>>> 'transaction._abort()' for my purpose:
>>>
>>>     - are there any usecases to expect actual changes to be rollbacked ?
>>>
>>>       In other words, does it cause problem to invoke abort callbacks
>>>       regardless of actual changes ?
>>
>> If I remember correctly, it make "transaction abort" "abort complete
>> message to happears in a lot of new place. Basically any place where we
>> open a transaction but eventually do nothing. We can probably not change it
>
> Adding invocation of abort callbacks just before cleaning journal
> files up can avoid such additional messages, even though duplicated
> code paths are redundant :-)
>
>      https://selenic.com/repo/hg/file/tip/mercurial/transaction.py#l477
>
>      diff --git a/mercurial/transaction.py b/mercurial/transaction.py
>      --- a/mercurial/transaction.py
>      +++ b/mercurial/transaction.py
>      @@ -482,6 +482,8 @@ class transaction(object):
>
>               try:
>                   if not self.entries and not self._backupentries:
>      +                for cat in sorted(self._abortcallback):
>      +                    self._abortcallback[cat](self)
>                       if self.journal:
>                           self.opener.unlink(self.journal)
>                       if self._backupjournal:

I do not understand what we are trying to achieve here. Having a wider 
definition of abort (abort apply even if transaction is empty) seems a 
dubious semantic and will have huge impact in the UI.


>
>
>>>     - are there any usecases to refer changes to be rollbacked in abort
>>>       callbacks ?
>>>
>>>       In other words, does it cause problem to invoke abort callbacks
>>>       after rollbacking ?
>>
>> I would says yes, but detail could be complicated (to be continued)
>>
>>>
>>>       BTW, 'txnabort' hooks are invoked without 'pending', and it means
>>>       that such changes should be invisible to 'txnabort' hooks, doesn't
>>>       it ? > Pierre-Yves
>>
>> And here we have proof of (1) people do not rely of it too much (2)
>> definition of the state of the transaction during abort is far from trivial.
>> We should probably keep it with rolled back content/
>
> It means that 'txnabort' hooks should be invoked with 'pending' (=
> rolled back content should be visible to them), doen't it ? (sorry for
> my poor English, if I misunderstood :-<)

No. It means that:

1) Nobody complains about the lack of pending so far (since we do not 
have it).

2) I'm not sure we can have anything sensible done here (reading pending 
with a transaction half applied, will be bad)

Conclusion: we should keep not having pending there.

>>>       Then, IMHO, such changes should be invisible to abort callbacks,
>>>       too.
>>>
>>> Maybe, should I introduce another callback to know the end of
>>> transaction (regardless of the result of it, or only at failure) ?
>>
>> Sounds like an awesome idea. Python have:
>>
>> try:
>>       .... # equivalent to transaction open
>> except:
>>       .... # equivalent to transaction abort
>> else:
>>       .... # equivalent to transaction success
>> finally:
>>       .... # equivalent of your proposal
>>
>> Finally is definitely very useful, so go for it.
>
> I'll do so.

Cool, (but It is not clear it is on the critical path, so no rush).
Katsunori FUJIWARA - July 16, 2015, 2:43 p.m.
At Tue, 14 Jul 2015 13:26:14 +0200,
Pierre-Yves David wrote:
>
> On 07/13/2015 07:44 AM, FUJIWARA Katsunori wrote:
> >
> >
> > At Sat, 11 Jul 2015 20:47:00 +0100,
> > Pierre-Yves David wrote:
> >
> >> On 07/11/2015 11:31 AM, FUJIWARA Katsunori wrote:
> >>> Now, I'm working for dirstate consistency around inside and border of
> >>> transaction, and planning to use this abort callback mechanism to know
> >>> the end of transaction at failure.
> >>
> >> I'm a bit curious about what code you need to run in such case. If you
> >> use the file generator mechanism you do not need to worries about
> >> restoring backup.
> >>
> >> However, I'm not sure how rolling back the file would atomatically
> >> invalide the in memory object content (probably not). Is this why you
> >> need a callback? could we (would it make sense to) integrate that as
> >> part of one of our current cache invalidation mechanism?
> >
> >
> > I'm planning to use callback mechanism for purposes below:
> >
> > 1. to make existing 'dirstate.write()' invocations write in-memory
> >     changes into appropriate file ('.hg/dirstate' or '.hg/dirstate.pending'),
> >     'dirstate.write()' should know whether transaction is running or not
> >
> >     I'm planning changes below for this purpose:
> >
> >     - invoke (newly added) 'dirstate.begintransaction()' at the
> >       beginning of transaction in 'localrepo.transaction()'
> >
> >     - invoke (newly added) 'dirstate.endtransaction()' at the end of
> >       transaction via "the end of transaction" callback (instead of
> >       finalize/abort callbacks)
> >
> >
> >     There are some other ways:
> >
> >     1-1. instantiate dirstate with 'repo'
> >          (or weak ref of repo to avoid circular reference ?)
> >
> >     1-2. make 'dirstate.write()' require 'localrepo' argument
> >
> >          This seems too large impact (especially for 3rd party
> >          extensions, even though Mercurial doesn't ensure compatibility
> >          of internal API at all :-))
> >
> >     In both cases, 'dirstate.write()' can determine appropriate output
> >     file by examination of 'repo.currenttransaction()'.
>
> I would prefer one of this options. Using hooks for such core code looks
> a bit hacky and fragile.

How about something new like 'unlock' of 'lock' (or 'after' of
transaction) ? Would you think this kind of 'call back' mechanism
itself is also fragile for core code (if newly added) ?


> I would be happy with wither 1-1 (weak-ref
> please) or 1-2. With a preference for 1-2.
>
> You can make the repo argument optional, with a devel warning if it is
> missing for example. To avoid too much breakage for a couple of version.

BTW, what is your strong reason to choose 1-2 ? to avoid using
weak ref of repo ?



> > 2. in-memory dirstate changes should be discarded at failure of
> >     transaction, to prevent them from being written out at (outer)
> >     'wlock.release' or so
> >
> >     I'm planning to do this via "the end of transaction" callback.
> >
> >
> >     Using 'dirstate.{begin|end}parentchange()' mechanism as below also
> >     works similarly.
> >
> >     - invoke beginparentchange at starting transaction
> >     - invoke endparentchange at (successfully) closing transaction
> >     - wlock.release at failure invokes 'dirstate.invalidate()'
> >
> >     But it seems too optimistic for outside transaction :-), and it
> >     can't ensure consistency just at the end of transaction.
>
> I'm a bit confused about why you would invoe dirstate.invalidate() at
> lock release. Can't we have something invalidating the dirstate content
> when transaction if rolledback (so that dirstate is read from disk?

(at first, 'dirstate.{begin|end}parentchange()' example above is
just a example to do similar thing in another way, but not what I
want to do)

> Do we have something forcing dirstate flushing at the transaction
> openning? (so that the backup we restore is okay?)

AFAIK, with current implementation:

- in-memory dirstate changes aren't explicitly written out at opening
  transaction

  In almost all cases, in-memory dirstate changes seem to be
  INDIRECTLY written out before transaction opening. But it isn't
  explicitly ensured.

  This causes:

  1. saving incomplete '.hg/dirstate' into '.hg/journal.dirstate' at
     opening transaction, and

  2. restoring one from (incorrect) '.hg/undo.dirstate', which is
     renamed from '.hg/journal.dirstate' above, at rollback

  (Oops, I have overlooked that journal file is created not from
  dirstate instance but from '.hg/dirstate' file at opening
  transaction :-< we have to fix this at first)

  For example, at "hg backout --commit", dirstate changes via
  reverting aren't written out until 'ctx.markcommitted()'
  after closing the transaction. And, "hg status" after "hg
  rollback" shows unexpected status for added/removed files.

  BTW, after my previous change fe03f522dda9, this issue can be
  reproduced, only when 'repo.status()' doesn't cause 'normal()'-ing
  on some unsure files :-)

    https://selenic.com/repo/hg/rev/fe03f522dda9

- in-memory dirstate changes aren't explicitly discarded at
  aborting transaction

  (for convenience, let's assume that '.hg/journal.dirstate' is
  correct, at this point)

  As you described, aborting transaction restores '.hg/dirstate' from
  '.hg/journal.dirstate' saved at opening transaction.

  But in-memory changes during transaction are still kept in
  'dirstate' instance, and they may be written into already restored
  '.hg/dirstate' at 'wlock.release()' or so.

  BTW, commands almost fully enclosed by transaction treat "in-memory
  changes during aborted transaction" in ways below:

  - in-memory changes are discarded by 'dirstateguard' OUTSIDE
    transaction scope
    ('hg qpush', 'hg commit --amend', and 'hg import')

    "discarding in-memory dirstate changes at aborting
    transaction" will make dirstateguard in these cases
    useless.

  - in-memory changes should be discarded, but not
    ('hg transplant')

    Aborting for other than conflict may cause unexpected dirstate
    (e.g. referring rollbacked revision as parent). This should be
    fixed.

  - discarding in-memory changes causes problem
    ('hg shelve' and 'hg unshelve')

    Current shelve/unshelve implementation expects aborting
    transaction NOT to discard in-memory dirstate changes intentionally
    or unintentionally.

    1. start transaction
    2. do shelve/unshelve, and this makes temporary revisions
    3. update working directory to the parent revision at (1)
       (+ some additional actions)
       - dirstate is different from one at (1), because
         some changes are shelved/unshelved
    4. abort transaction intentionally
       - rollback revisions created at (2)
       - restore '.hg/dirstate' from one at (1)
    5. write dirstate at 'wlock.release'
       - here, '.hg/dirstate' is equal to one at (3)

    IMHO, "strip by aborting current transaction" at (4) above
    should be replaced by "strip by rollbacking the transaction
    after once closing it".

    The latter doesn't strip the parent of the working directory at
    stripping because of updating at (3): this is non-"parentgone"
    case. Then, dirstate is kept as same as one at (3) above.

Let me confirm how dirstate should be treated at the border of
transaction scope.

- at opening transaction:

  - write in-memory dirstate changes into '.hg/dirstate' (not yet)

- at successfully closing transaction:

  - write changes during transaction into '.hg/dirstate' (not yet)
  - rename from '.hg/journal.dirstate' to '.hg/undo.dirstate' (OK)

- at aborting transaction:

  - restore '.hg/dirstate' from '.hg/journal.dirstat' (OK)
  - discard in-memory dirstate changes (not yet)


> >>> But current implementation doesn't invoke abort callbacks, if there is
> >>> no change on repository data at aborting transaction: e.g. "hg import"
> >>> fails to import the first patch.
> >>
> >> The change to dirstate should most probably register himself as change
> >> to the transaction. That would trigger it.
> >
> > I found two problems of 'addfilegenerator()' below. IMHO, the former
> > is a major problem, and the latter is minor one (but you should
> > dislike it as later :-)).
>
> If addfilegenerator fails to fit our needs here, we should probably
> hammer it so it does.
>
> > And, both of my fixing plans cause omitting invocation of abort
> > callbacks, because they makes 'transaction._backupentries' empty :-<
> >
> > - current 'addfilegenerator()' causes forcibly restoring
> >    '.hg/dirstate' at rollback-ing, even if rollback-ing doesn't change
> >    parents of the working directory
> >
> >    This causes unexpected "hg status" output at rollback-ing in such
> >    situation (aka non-"parentgone" case).
>
> Do you mean there is category of change to dirstate that must not be
> rolledback if transaction fails? I do not see data about this on the
> wikipage (but I could be blind) Can you elaborate?
>
> I'm not sure to understand the issue here.

"rollback" above is not one by aborting while transaction running, but
one by "hg rollback"/"repo.rollback()" after once closing transaction.

In the later case, parents of the working directory may differ
from one at opening transaction, and forcible restoring dirstate
causes unexpected result of subsequent "hg status".

Current 'addfilegenerator()' mechanism records '.hg/dirstate' as to be
restored at rollback, and it causes forcible restoring.

'repo.rollback()' avoids this problem by restoring dirstate by itself.

    https://selenic.com/repo/hg/file/tip/mercurial/localrepo.py#l1110


> >    I'm planning to introduce 'backup' optional argument, which controls
> >    'addbackup()' invocation in '_generatefiles()' to avoid issue above,
> >    to 'addfilegenerator()'.
> >
> >      https://selenic.com/repo/hg/file/tip/mercurial/transaction.py#l267
>
> That sound wrong to me. We either have change to be apply on transaction
> success and rolledback on transaction failure. But I can't see case were
> backup will be wrong.
>
>
> >    According to discussion before, dirstate writes changes into (a)
> >    '.hg/dirstate' at the beginning of transaction and (b)
> >    '.hg/dirstate.pending' while transaction running. Then, discarding
> >    the latter can work as a kind of restoring. So, omitting
> >    'addbackup()' invocation doesn't cause problem at least for dirstate.
> >
> >    Or is another trick like "add to _backupentries, but omit backup
> >    itself" better ?
>
> The backup are also used for rollback (and hg recover). We must keep it.
>
> >
> > - current 'addfilegenerator()' causes forcibly writing data out at
> >    'writepending()', even if data hasn't been changed since last
> >    writing out
> >
> >    When external hooks are invoked while transaction, this causes
> >    additional "transaction abort!" and "rollback completed" output,
> >    even though dirstate hasn't been changed since the beginning of
> >    transaction.
> >
> >    For example, pushing changes/bookmarks causes writing into
> >    'dirstate.pending' at REMOTE side, even though it never uses and
> >    changes dirstate.
>
> If there is no change to the dirstate. Why are we recording any change
> into the transaction at all?

I understood that 'addfilegenerator()' is used for files which may be
changed during transaction like "journal", and invoked it for dirstate
at opening transaction.

Should I use it after examination whether dirstate has been changed
since opening transaction ?


> >    I'm planning to introduce 'checkdirty' optional argument, which is
> >    used to examine whether data has been changed since last writing
> >    out (= 'genfunc' should be invoked or not), to avoid redundant
> >    writing data out.
>
> Yes, this way of detecting dirtyness have been bothering me in the past.
> I've a small series to clean this us, let me undust it and patchbomb it.
>
> >>> So, there is no easy and certain way to know the end of transaction at
> >>> failure.
> >>>
> >>> BTW, this also causes that 'txnabort' hook isn't invoked at aborting
> >>> transaction in such case, because 'txnabort' hook invocation is
> >>> achieved by abort callback mechanism. This may need some cleanup for
> >>> orphan 'pretxnopen' hook assuming that one of 'txnclose' or 'txnabort'
> >>> will be invoked subsequently.
> >>>
> >>> Then, please let me confirm some points below before changing around
> >>> 'transaction._abort()' for my purpose:
> >>>
> >>>     - are there any usecases to expect actual changes to be rollbacked ?
> >>>
> >>>       In other words, does it cause problem to invoke abort callbacks
> >>>       regardless of actual changes ?
> >>
> >> If I remember correctly, it make "transaction abort" "abort complete
> >> message to happears in a lot of new place. Basically any place where we
> >> open a transaction but eventually do nothing. We can probably not change it
> >
> > Adding invocation of abort callbacks just before cleaning journal
> > files up can avoid such additional messages, even though duplicated
> > code paths are redundant :-)
> >
> >      https://selenic.com/repo/hg/file/tip/mercurial/transaction.py#l477
> >
> >      diff --git a/mercurial/transaction.py b/mercurial/transaction.py
> >      --- a/mercurial/transaction.py
> >      +++ b/mercurial/transaction.py
> >      @@ -482,6 +482,8 @@ class transaction(object):
> >
> >               try:
> >                   if not self.entries and not self._backupentries:
> >      +                for cat in sorted(self._abortcallback):
> >      +                    self._abortcallback[cat](self)
> >                       if self.journal:
> >                           self.opener.unlink(self.journal)
> >                       if self._backupjournal:
>
> I do not understand what we are trying to achieve here. Having a wider
> definition of abort (abort apply even if transaction is empty) seems a
> dubious semantic and will have huge impact in the UI.
>
>
> >
> >
> >>>     - are there any usecases to refer changes to be rollbacked in abort
> >>>       callbacks ?
> >>>
> >>>       In other words, does it cause problem to invoke abort callbacks
> >>>       after rollbacking ?
> >>
> >> I would says yes, but detail could be complicated (to be continued)
> >>
> >>>
> >>>       BTW, 'txnabort' hooks are invoked without 'pending', and it means
> >>>       that such changes should be invisible to 'txnabort' hooks, doesn't
> >>>       it ? > Pierre-Yves
> >>
> >> And here we have proof of (1) people do not rely of it too much (2)
> >> definition of the state of the transaction during abort is far from trivial.
> >> We should probably keep it with rolled back content/
> >
> > It means that 'txnabort' hooks should be invoked with 'pending' (=
> > rolled back content should be visible to them), doen't it ? (sorry for
> > my poor English, if I misunderstood :-<)
>
> No. It means that:
>
> 1) Nobody complains about the lack of pending so far (since we do not
> have it).
>
> 2) I'm not sure we can have anything sensible done here (reading pending
> with a transaction half applied, will be bad)
>
> Conclusion: we should keep not having pending there.

OK, I see.


> >>>       Then, IMHO, such changes should be invisible to abort callbacks,
> >>>       too.
> >>>
> >>> Maybe, should I introduce another callback to know the end of
> >>> transaction (regardless of the result of it, or only at failure) ?
> >>
> >> Sounds like an awesome idea. Python have:
> >>
> >> try:
> >>       .... # equivalent to transaction open
> >> except:
> >>       .... # equivalent to transaction abort
> >> else:
> >>       .... # equivalent to transaction success
> >> finally:
> >>       .... # equivalent of your proposal
> >>
> >> Finally is definitely very useful, so go for it.
> >
> > I'll do so.
>
> Cool, (but It is not clear it is on the critical path, so no rush).
>
> --
> Pierre-Yves David
>


----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Pierre-Yves David - July 17, 2015, 11:48 p.m.
On 07/16/2015 04:43 PM, FUJIWARA Katsunori wrote:
>
> At Tue, 14 Jul 2015 13:26:14 +0200,
> Pierre-Yves David wrote:
>>
>> On 07/13/2015 07:44 AM, FUJIWARA Katsunori wrote:
>>>
>>>
>>> At Sat, 11 Jul 2015 20:47:00 +0100,
>>> Pierre-Yves David wrote:
>>>
>>>> On 07/11/2015 11:31 AM, FUJIWARA Katsunori wrote:
>>>>> Now, I'm working for dirstate consistency around inside and border of
>>>>> transaction, and planning to use this abort callback mechanism to know
>>>>> the end of transaction at failure.
>>>>
>>>> I'm a bit curious about what code you need to run in such case. If you
>>>> use the file generator mechanism you do not need to worries about
>>>> restoring backup.
>>>>
>>>> However, I'm not sure how rolling back the file would atomatically
>>>> invalide the in memory object content (probably not). Is this why you
>>>> need a callback? could we (would it make sense to) integrate that as
>>>> part of one of our current cache invalidation mechanism?
>>>
>>>
>>> I'm planning to use callback mechanism for purposes below:
>>>
>>> 1. to make existing 'dirstate.write()' invocations write in-memory
>>>      changes into appropriate file ('.hg/dirstate' or '.hg/dirstate.pending'),
>>>      'dirstate.write()' should know whether transaction is running or not
>>>
>>>      I'm planning changes below for this purpose:
>>>
>>>      - invoke (newly added) 'dirstate.begintransaction()' at the
>>>        beginning of transaction in 'localrepo.transaction()'
>>>
>>>      - invoke (newly added) 'dirstate.endtransaction()' at the end of
>>>        transaction via "the end of transaction" callback (instead of
>>>        finalize/abort callbacks)
>>>
>>>
>>>      There are some other ways:
>>>
>>>      1-1. instantiate dirstate with 'repo'
>>>           (or weak ref of repo to avoid circular reference ?)
>>>
>>>      1-2. make 'dirstate.write()' require 'localrepo' argument
>>>
>>>           This seems too large impact (especially for 3rd party
>>>           extensions, even though Mercurial doesn't ensure compatibility
>>>           of internal API at all :-))
>>>
>>>      In both cases, 'dirstate.write()' can determine appropriate output
>>>      file by examination of 'repo.currenttransaction()'.
>>
>> I would prefer one of this options. Using hooks for such core code looks
>> a bit hacky and fragile.
>
> How about something new like 'unlock' of 'lock' (or 'after' of
> transaction) ? Would you think this kind of 'call back' mechanism
> itself is also fragile for core code (if newly added) ?

I do not understand what you mean.

I'm worried about hook approach being "fragile" because it dirstate 
defines very important data for the understanding of the repo. Having it 
a the same place as other extensions and config hook will probably lead 
to some ordering hell.

>> I would be happy with wither 1-1 (weak-ref
>> please) or 1-2. With a preference for 1-2.
>>
>> You can make the repo argument optional, with a devel warning if it is
>> missing for example. To avoid too much breakage for a couple of version.
>
> BTW, what is your strong reason to choose 1-2 ? to avoid using
> weak ref of repo ?

I do not have a strong opinion about 1-1 vs 1-2.

1-2 avoid keeping the cyclic reference in the object design. We use the 
pattern of "providing a repo" when needed in multiple place. It also 
help refactoring code to have the argument explicitly passed. because it 
ensure function calling write have access to all necessary argument 
(here repo).

I think Matt have a strong opinion agains weakref.

>>> 2. in-memory dirstate changes should be discarded at failure of
>>>      transaction, to prevent them from being written out at (outer)
>>>      'wlock.release' or so
>>>
>>>      I'm planning to do this via "the end of transaction" callback.
>>>
>>>
>>>      Using 'dirstate.{begin|end}parentchange()' mechanism as below also
>>>      works similarly.
>>>
>>>      - invoke beginparentchange at starting transaction
>>>      - invoke endparentchange at (successfully) closing transaction
>>>      - wlock.release at failure invokes 'dirstate.invalidate()'
>>>
>>>      But it seems too optimistic for outside transaction :-), and it
>>>      can't ensure consistency just at the end of transaction.
>>
>> I'm a bit confused about why you would invoe dirstate.invalidate() at
>> lock release. Can't we have something invalidating the dirstate content
>> when transaction if rolledback (so that dirstate is read from disk?
>
> (at first, 'dirstate.{begin|end}parentchange()' example above is
> just a example to do similar thing in another way, but not what I
> want to do)
>
>> Do we have something forcing dirstate flushing at the transaction
>> openning? (so that the backup we restore is okay?)
>
> AFAIK, with current implementation:
>
> - in-memory dirstate changes aren't explicitly written out at opening
>    transaction
>
>    In almost all cases, in-memory dirstate changes seem to be
>    INDIRECTLY written out before transaction opening. But it isn't
>    explicitly ensured.
>
>    This causes:
>
>    1. saving incomplete '.hg/dirstate' into '.hg/journal.dirstate' at
>       opening transaction, and
>
>    2. restoring one from (incorrect) '.hg/undo.dirstate', which is
>       renamed from '.hg/journal.dirstate' above, at rollback
>
>    (Oops, I have overlooked that journal file is created not from
>    dirstate instance but from '.hg/dirstate' file at opening
>    transaction :-< we have to fix this at first)
>
>    For example, at "hg backout --commit", dirstate changes via
>    reverting aren't written out until 'ctx.markcommitted()'
>    after closing the transaction. And, "hg status" after "hg
>    rollback" shows unexpected status for added/removed files.
>
>    BTW, after my previous change fe03f522dda9, this issue can be
>    reproduced, only when 'repo.status()' doesn't cause 'normal()'-ing
>    on some unsure files :-)
>
>      https://selenic.com/repo/hg/rev/fe03f522dda9

It seems like we should never start a transaction with a dirty dirstate. 
Is there any theoritical reason for not doing it?


>
> - in-memory dirstate changes aren't explicitly discarded at
>    aborting transaction
>
>    (for convenience, let's assume that '.hg/journal.dirstate' is
>    correct, at this point)
>
>    As you described, aborting transaction restores '.hg/dirstate' from
>    '.hg/journal.dirstate' saved at opening transaction.
>
>    But in-memory changes during transaction are still kept in
>    'dirstate' instance, and they may be written into already restored
>    '.hg/dirstate' at 'wlock.release()' or so.

This sounds wrong (but less keep moving)

>
>    BTW, commands almost fully enclosed by transaction treat "in-memory
>    changes during aborted transaction" in ways below:
>
>    - in-memory changes are discarded by 'dirstateguard' OUTSIDE
>      transaction scope
>      ('hg qpush', 'hg commit --amend', and 'hg import')
>
>      "discarding in-memory dirstate changes at aborting
>      transaction" will make dirstateguard in these cases
>      useless.
>
>    - in-memory changes should be discarded, but not
>      ('hg transplant')

I do not understand this sentence.

>      Aborting for other than conflict may cause unexpected dirstate
>      (e.g. referring rollbacked revision as parent). This should be
>      fixed.
>
>    - discarding in-memory changes causes problem
>      ('hg shelve' and 'hg unshelve')
>
>      Current shelve/unshelve implementation expects aborting
>      transaction NOT to discard in-memory dirstate changes intentionally
>      or unintentionally.
>
>      1. start transaction
>      2. do shelve/unshelve, and this makes temporary revisions
>      3. update working directory to the parent revision at (1)
>         (+ some additional actions)
>         - dirstate is different from one at (1), because
>           some changes are shelved/unshelved
>      4. abort transaction intentionally
>         - rollback revisions created at (2)
>         - restore '.hg/dirstate' from one at (1)
>      5. write dirstate at 'wlock.release'
>         - here, '.hg/dirstate' is equal to one at (3)
>
>      IMHO, "strip by aborting current transaction" at (4) above
>      should be replaced by "strip by rollbacking the transaction
>      after once closing it".
>
>      The latter doesn't strip the parent of the working directory at
>      stripping because of updating at (3): this is non-"parentgone"
>      case. Then, dirstate is kept as same as one at (3) above.

Is there anything else than shelve relying on this. Having shelve 
relying on a wrong/buggy behavior from core seems debatable.
It should not block us to fix core.

>
> Let me confirm how dirstate should be treated at the border of
> transaction scope.
>
> - at opening transaction:
>
>    - write in-memory dirstate changes into '.hg/dirstate' (not yet)
>
> - at successfully closing transaction:
>
>    - write changes during transaction into '.hg/dirstate' (not yet)
>    - rename from '.hg/journal.dirstate' to '.hg/undo.dirstate' (OK)
>
> - at aborting transaction:
>
>    - restore '.hg/dirstate' from '.hg/journal.dirstat' (OK)
>    - discard in-memory dirstate changes (not yet)

This summary looks perfect. Any objection/issue with doing this (beside 
shelve abuse?)

>>>>> But current implementation doesn't invoke abort callbacks, if there is
>>>>> no change on repository data at aborting transaction: e.g. "hg import"
>>>>> fails to import the first patch.
>>>>
>>>> The change to dirstate should most probably register himself as change
>>>> to the transaction. That would trigger it.
>>>
>>> I found two problems of 'addfilegenerator()' below. IMHO, the former
>>> is a major problem, and the latter is minor one (but you should
>>> dislike it as later :-)).
>>
>> If addfilegenerator fails to fit our needs here, we should probably
>> hammer it so it does.
>>
>>> And, both of my fixing plans cause omitting invocation of abort
>>> callbacks, because they makes 'transaction._backupentries' empty :-<
>>>
>>> - current 'addfilegenerator()' causes forcibly restoring
>>>     '.hg/dirstate' at rollback-ing, even if rollback-ing doesn't change
>>>     parents of the working directory
>>>
>>>     This causes unexpected "hg status" output at rollback-ing in such
>>>     situation (aka non-"parentgone" case).
>>
>> Do you mean there is category of change to dirstate that must not be
>> rolledback if transaction fails? I do not see data about this on the
>> wikipage (but I could be blind) Can you elaborate?
>>
>> I'm not sure to understand the issue here.
>
> "rollback" above is not one by aborting while transaction running, but
> one by "hg rollback"/"repo.rollback()" after once closing transaction.
>
> In the later case, parents of the working directory may differ
> from one at opening transaction, and forcible restoring dirstate
> causes unexpected result of subsequent "hg status".
>
> Current 'addfilegenerator()' mechanism records '.hg/dirstate' as to be
> restored at rollback, and it causes forcible restoring.
>
> 'repo.rollback()' avoids this problem by restoring dirstate by itself.
>
>      https://selenic.com/repo/hg/file/tip/mercurial/localrepo.py#l1110

I'm not sure about what this section (of the email) is about. Look like 
the linked code is close to what we need to invalidate the dirstate in 
memory content. But its late so I'm giving up with this section. I think 
we made good enough progress on the other ones.

>>>     I'm planning to introduce 'backup' optional argument, which controls
>>>     'addbackup()' invocation in '_generatefiles()' to avoid issue above,
>>>     to 'addfilegenerator()'.
>>>
>>>       https://selenic.com/repo/hg/file/tip/mercurial/transaction.py#l267
>>
>> That sound wrong to me. We either have change to be apply on transaction
>> success and rolledback on transaction failure. But I can't see case were
>> backup will be wrong.
>>
>>
>>>     According to discussion before, dirstate writes changes into (a)
>>>     '.hg/dirstate' at the beginning of transaction and (b)
>>>     '.hg/dirstate.pending' while transaction running. Then, discarding
>>>     the latter can work as a kind of restoring. So, omitting
>>>     'addbackup()' invocation doesn't cause problem at least for dirstate.
>>>
>>>     Or is another trick like "add to _backupentries, but omit backup
>>>     itself" better ?
>>
>> The backup are also used for rollback (and hg recover). We must keep it.
>>
>>>
>>> - current 'addfilegenerator()' causes forcibly writing data out at
>>>     'writepending()', even if data hasn't been changed since last
>>>     writing out
>>>
>>>     When external hooks are invoked while transaction, this causes
>>>     additional "transaction abort!" and "rollback completed" output,
>>>     even though dirstate hasn't been changed since the beginning of
>>>     transaction.
>>>
>>>     For example, pushing changes/bookmarks causes writing into
>>>     'dirstate.pending' at REMOTE side, even though it never uses and
>>>     changes dirstate.
>>
>> If there is no change to the dirstate. Why are we recording any change
>> into the transaction at all?
>
> I understood that 'addfilegenerator()' is used for files which may be
> changed during transaction like "journal", and invoked it for dirstate
> at opening transaction.
>
> Should I use it after examination whether dirstate has been changed
> since opening transaction ?

The current semantic is:

   I've new content, here is how to generated it on disk.

So it should be called by the code updating dirstate to "commit" the 
change. It should replace (or be contained in) dirstate.write() calls.

(we should probably update the docstring to make it clean)
Katsunori FUJIWARA - Aug. 13, 2015, 3:17 a.m.
At Sat, 18 Jul 2015 01:48:24 +0200,
Pierre-Yves David wrote:
> 
> On 07/16/2015 04:43 PM, FUJIWARA Katsunori wrote:
> >
> > At Tue, 14 Jul 2015 13:26:14 +0200,
> > Pierre-Yves David wrote:
> >>
> >> On 07/13/2015 07:44 AM, FUJIWARA Katsunori wrote:
> >>>
> >>>
> >>> At Sat, 11 Jul 2015 20:47:00 +0100,
> >>> Pierre-Yves David wrote:
> >>>
> >>>> On 07/11/2015 11:31 AM, FUJIWARA Katsunori wrote:
> >>>>> Now, I'm working for dirstate consistency around inside and border of
> >>>>> transaction, and planning to use this abort callback mechanism to know
> >>>>> the end of transaction at failure.
> >>>>
> >>>> I'm a bit curious about what code you need to run in such case. If you
> >>>> use the file generator mechanism you do not need to worries about
> >>>> restoring backup.
> >>>>
> >>>> However, I'm not sure how rolling back the file would atomatically
> >>>> invalide the in memory object content (probably not). Is this why you
> >>>> need a callback? could we (would it make sense to) integrate that as
> >>>> part of one of our current cache invalidation mechanism?
> >>>
> >>>
> >>> I'm planning to use callback mechanism for purposes below:
> >>>
> >>> 1. to make existing 'dirstate.write()' invocations write in-memory
> >>>      changes into appropriate file ('.hg/dirstate' or '.hg/dirstate.pending'),
> >>>      'dirstate.write()' should know whether transaction is running or not
> >>>
> >>>      I'm planning changes below for this purpose:
> >>>
> >>>      - invoke (newly added) 'dirstate.begintransaction()' at the
> >>>        beginning of transaction in 'localrepo.transaction()'
> >>>
> >>>      - invoke (newly added) 'dirstate.endtransaction()' at the end of
> >>>        transaction via "the end of transaction" callback (instead of
> >>>        finalize/abort callbacks)
> >>>
> >>>
> >>>      There are some other ways:
> >>>
> >>>      1-1. instantiate dirstate with 'repo'
> >>>           (or weak ref of repo to avoid circular reference ?)
> >>>
> >>>      1-2. make 'dirstate.write()' require 'localrepo' argument
> >>>
> >>>           This seems too large impact (especially for 3rd party
> >>>           extensions, even though Mercurial doesn't ensure compatibility
> >>>           of internal API at all :-))
> >>>
> >>>      In both cases, 'dirstate.write()' can determine appropriate output
> >>>      file by examination of 'repo.currenttransaction()'.
> >>
> >> I would prefer one of this options. Using hooks for such core code looks
> >> a bit hacky and fragile.
> >
> > How about something new like 'unlock' of 'lock' (or 'after' of
> > transaction) ? Would you think this kind of 'call back' mechanism
> > itself is also fragile for core code (if newly added) ?
> 
> I do not understand what you mean.
> 
> I'm worried about hook approach being "fragile" because it dirstate 
> defines very important data for the understanding of the repo. Having it 
> a the same place as other extensions and config hook will probably lead 
> to some ordering hell.

(Oops, 'unlock' is just a name of a local variable in
'localrepo.lock()'.  Correct name of constructor argument is
'releasefn')

As you said, using 'hook' may lead to some ordering hell by other
extensions and config hook.

What we want is a kind of 'releasefn' callback for 'lock' class, which
is called at the end of lock scope via invocation of own 'release()'
(or via '__del__' of itself).

'transaction' constructor takes 'after' callback argument, but it is
called only when transaction is successfully closing.

To provide safe notification mechanism, which is isolated from hook
infrastructure, how about newly introducing a kind of 'releasefn' for
'lock' into 'transaction' ?

BTW, 'close'/'closing' are already used as a meaning of 'successful
end of transaction' in many situations (e.g. name of hook). What name
is suitable for the hook invoked at "the end of transaction scope" ?


> >> I would be happy with wither 1-1 (weak-ref
> >> please) or 1-2. With a preference for 1-2.
> >>
> >> You can make the repo argument optional, with a devel warning if it is
> >> missing for example. To avoid too much breakage for a couple of version.
> >
> > BTW, what is your strong reason to choose 1-2 ? to avoid using
> > weak ref of repo ?
> 
> I do not have a strong opinion about 1-1 vs 1-2.
> 
> 1-2 avoid keeping the cyclic reference in the object design. We use the 
> pattern of "providing a repo" when needed in multiple place. It also 
> help refactoring code to have the argument explicitly passed. because it 
> ensure function calling write have access to all necessary argument 
> (here repo).
> 
> I think Matt have a strong opinion agains weakref.

I see.


> >>> 2. in-memory dirstate changes should be discarded at failure of
> >>>      transaction, to prevent them from being written out at (outer)
> >>>      'wlock.release' or so
> >>>
> >>>      I'm planning to do this via "the end of transaction" callback.
> >>>
> >>>
> >>>      Using 'dirstate.{begin|end}parentchange()' mechanism as below also
> >>>      works similarly.
> >>>
> >>>      - invoke beginparentchange at starting transaction
> >>>      - invoke endparentchange at (successfully) closing transaction
> >>>      - wlock.release at failure invokes 'dirstate.invalidate()'
> >>>
> >>>      But it seems too optimistic for outside transaction :-), and it
> >>>      can't ensure consistency just at the end of transaction.
> >>
> >> I'm a bit confused about why you would invoe dirstate.invalidate() at
> >> lock release. Can't we have something invalidating the dirstate content
> >> when transaction if rolledback (so that dirstate is read from disk?
> >
> > (at first, 'dirstate.{begin|end}parentchange()' example above is
> > just a example to do similar thing in another way, but not what I
> > want to do)
> >
> >> Do we have something forcing dirstate flushing at the transaction
> >> openning? (so that the backup we restore is okay?)
> >
> > AFAIK, with current implementation:
> >
> > - in-memory dirstate changes aren't explicitly written out at opening
> >    transaction
> >
> >    In almost all cases, in-memory dirstate changes seem to be
> >    INDIRECTLY written out before transaction opening. But it isn't
> >    explicitly ensured.
> >
> >    This causes:
> >
> >    1. saving incomplete '.hg/dirstate' into '.hg/journal.dirstate' at
> >       opening transaction, and
> >
> >    2. restoring one from (incorrect) '.hg/undo.dirstate', which is
> >       renamed from '.hg/journal.dirstate' above, at rollback
> >
> >    (Oops, I have overlooked that journal file is created not from
> >    dirstate instance but from '.hg/dirstate' file at opening
> >    transaction :-< we have to fix this at first)
> >
> >    For example, at "hg backout --commit", dirstate changes via
> >    reverting aren't written out until 'ctx.markcommitted()'
> >    after closing the transaction. And, "hg status" after "hg
> >    rollback" shows unexpected status for added/removed files.
> >
> >    BTW, after my previous change fe03f522dda9, this issue can be
> >    reproduced, only when 'repo.status()' doesn't cause 'normal()'-ing
> >    on some unsure files :-)
> >
> >      https://selenic.com/repo/hg/rev/fe03f522dda9
> 
> It seems like we should never start a transaction with a dirty dirstate. 
> Is there any theoritical reason for not doing it?

I don't know any theoretical reason for not doing it, and this issue
was fixed by 800e090e9c64.

    https://selenic.com/repo/hg/rev/800e090e9c64

> >
> > - in-memory dirstate changes aren't explicitly discarded at
> >    aborting transaction
> >
> >    (for convenience, let's assume that '.hg/journal.dirstate' is
> >    correct, at this point)
> >
> >    As you described, aborting transaction restores '.hg/dirstate' from
> >    '.hg/journal.dirstate' saved at opening transaction.
> >
> >    But in-memory changes during transaction are still kept in
> >    'dirstate' instance, and they may be written into already restored
> >    '.hg/dirstate' at 'wlock.release()' or so.
> 
> This sounds wrong (but less keep moving)
> 
> >
> >    BTW, commands almost fully enclosed by transaction treat "in-memory
> >    changes during aborted transaction" in ways below:
> >
> >    - in-memory changes are discarded by 'dirstateguard' OUTSIDE
> >      transaction scope
> >      ('hg qpush', 'hg commit --amend', and 'hg import')
> >
> >      "discarding in-memory dirstate changes at aborting
> >      transaction" will make dirstateguard in these cases
> >      useless.
> >
> >    - in-memory changes should be discarded, but not
> >      ('hg transplant')
> 
> I do not understand this sentence.

This issue of transplant was fixed by 99e88320d665.

    https://selenic.com/repo/hg/rev/99e88320d665

> >      Aborting for other than conflict may cause unexpected dirstate
> >      (e.g. referring rollbacked revision as parent). This should be
> >      fixed.
> >
> >    - discarding in-memory changes causes problem
> >      ('hg shelve' and 'hg unshelve')
> >
> >      Current shelve/unshelve implementation expects aborting
> >      transaction NOT to discard in-memory dirstate changes intentionally
> >      or unintentionally.
> >
> >      1. start transaction
> >      2. do shelve/unshelve, and this makes temporary revisions
> >      3. update working directory to the parent revision at (1)
> >         (+ some additional actions)
> >         - dirstate is different from one at (1), because
> >           some changes are shelved/unshelved
> >      4. abort transaction intentionally
> >         - rollback revisions created at (2)
> >         - restore '.hg/dirstate' from one at (1)
> >      5. write dirstate at 'wlock.release'
> >         - here, '.hg/dirstate' is equal to one at (3)
> >
> >      IMHO, "strip by aborting current transaction" at (4) above
> >      should be replaced by "strip by rollbacking the transaction
> >      after once closing it".
> >
> >      The latter doesn't strip the parent of the working directory at
> >      stripping because of updating at (3): this is non-"parentgone"
> >      case. Then, dirstate is kept as same as one at (3) above.
> 
> Is there anything else than shelve relying on this. Having shelve 
> relying on a wrong/buggy behavior from core seems debatable.
> It should not block us to fix core.

AFAIK, only shelve relies on this wrong/buggy behavior.

I confirmed that "explicitly rollbacking transaction just after
committing it" can replace current problematic implementation in
shelve.


> > Let me confirm how dirstate should be treated at the border of
> > transaction scope.
> >
> > - at opening transaction:
> >
> >    - write in-memory dirstate changes into '.hg/dirstate' (not yet)
> >
> > - at successfully closing transaction:
> >
> >    - write changes during transaction into '.hg/dirstate' (not yet)
> >    - rename from '.hg/journal.dirstate' to '.hg/undo.dirstate' (OK)
> >
> > - at aborting transaction:
> >
> >    - restore '.hg/dirstate' from '.hg/journal.dirstat' (OK)
> >    - discard in-memory dirstate changes (not yet)
> 
> This summary looks perfect. Any objection/issue with doing this (beside 
> shelve abuse?)
> 
> >>>>> But current implementation doesn't invoke abort callbacks, if there is
> >>>>> no change on repository data at aborting transaction: e.g. "hg import"
> >>>>> fails to import the first patch.
> >>>>
> >>>> The change to dirstate should most probably register himself as change
> >>>> to the transaction. That would trigger it.
> >>>
> >>> I found two problems of 'addfilegenerator()' below. IMHO, the former
> >>> is a major problem, and the latter is minor one (but you should
> >>> dislike it as later :-)).
> >>
> >> If addfilegenerator fails to fit our needs here, we should probably
> >> hammer it so it does.
> >>
> >>> And, both of my fixing plans cause omitting invocation of abort
> >>> callbacks, because they makes 'transaction._backupentries' empty :-<
> >>>
> >>> - current 'addfilegenerator()' causes forcibly restoring
> >>>     '.hg/dirstate' at rollback-ing, even if rollback-ing doesn't change
> >>>     parents of the working directory
> >>>
> >>>     This causes unexpected "hg status" output at rollback-ing in such
> >>>     situation (aka non-"parentgone" case).
> >>
> >> Do you mean there is category of change to dirstate that must not be
> >> rolledback if transaction fails? I do not see data about this on the
> >> wikipage (but I could be blind) Can you elaborate?
> >>
> >> I'm not sure to understand the issue here.
> >
> > "rollback" above is not one by aborting while transaction running, but
> > one by "hg rollback"/"repo.rollback()" after once closing transaction.
> >
> > In the later case, parents of the working directory may differ
> > from one at opening transaction, and forcible restoring dirstate
> > causes unexpected result of subsequent "hg status".
> >
> > Current 'addfilegenerator()' mechanism records '.hg/dirstate' as to be
> > restored at rollback, and it causes forcible restoring.
> >
> > 'repo.rollback()' avoids this problem by restoring dirstate by itself.
> >
> >      https://selenic.com/repo/hg/file/tip/mercurial/localrepo.py#l1110

I put the link again but use concrete changeset ID instead of 'tip',
for referring safely in the future :-)

    https://selenic.com/repo/hg/file/a7527c5769bb/mercurial/localrepo.py#l1116


> I'm not sure about what this section (of the email) is about. Look like 
> the linked code is close to what we need to invalidate the dirstate in 
> memory content. But its late so I'm giving up with this section. I think 
> we made good enough progress on the other ones.

'journal.dirstate' (or 'undo.dirstate' renamed from it) is used to
restore dirstate (1) at aborting transaction, and (2) at "hg rollback"
after committing transaction.

In the former case, restoring dirstate from 'journal.dirstate' is
always correct (no explanation is needed, isn't it ? :-))

In the latter case, the working directory may be updated to the
revision other than one(s) committed in previous transaction.

If NOT so (case 2-1), dirstate should be restored like case (1),
because current parent revision will go away after rollbacking. This
situation is called as "parentgone".

But otherwise (case 2-2), dirstate shouldn't be restored from
'undo.dirstate', because this restoring is equal to unexpected "hg
debugsetparent" (+ breaking file status) for users.

Here, current 'addfilegenerator()' implementation forcibly puts the
specified file name into 'journal.backupfiles' (or 'undo.backupfiles'
renamed from it) to restore changed files at aborting (or rollbacking)
transaction.

Then, using 'addfilegenerator()' always causes restoring dirstate via
'undo.backupfiles' at 'hg rollback' regardless of "parentgone" (2-1)
or not (2-2), even though 'localrepo._rollback()' tries to avoid it in
case (2-2). As described above, this is unexpected "hg debugsetparent"
(+ breaking file status).


> >>>     I'm planning to introduce 'backup' optional argument, which controls
> >>>     'addbackup()' invocation in '_generatefiles()' to avoid issue above,
> >>>     to 'addfilegenerator()'.
> >>>
> >>>       https://selenic.com/repo/hg/file/tip/mercurial/transaction.py#l267
> >>
> >> That sound wrong to me. We either have change to be apply on transaction
> >> success and rolledback on transaction failure. But I can't see case were
> >> backup will be wrong.
> >>
> >>
> >>>     According to discussion before, dirstate writes changes into (a)
> >>>     '.hg/dirstate' at the beginning of transaction and (b)
> >>>     '.hg/dirstate.pending' while transaction running. Then, discarding
> >>>     the latter can work as a kind of restoring. So, omitting
> >>>     'addbackup()' invocation doesn't cause problem at least for dirstate.
> >>>
> >>>     Or is another trick like "add to _backupentries, but omit backup
> >>>     itself" better ?
> >>
> >> The backup are also used for rollback (and hg recover). We must keep it.
> >>
> >>>
> >>> - current 'addfilegenerator()' causes forcibly writing data out at
> >>>     'writepending()', even if data hasn't been changed since last
> >>>     writing out
> >>>
> >>>     When external hooks are invoked while transaction, this causes
> >>>     additional "transaction abort!" and "rollback completed" output,
> >>>     even though dirstate hasn't been changed since the beginning of
> >>>     transaction.
> >>>
> >>>     For example, pushing changes/bookmarks causes writing into
> >>>     'dirstate.pending' at REMOTE side, even though it never uses and
> >>>     changes dirstate.
> >>
> >> If there is no change to the dirstate. Why are we recording any change
> >> into the transaction at all?
> >
> > I understood that 'addfilegenerator()' is used for files which may be
> > changed during transaction like "journal", and invoked it for dirstate
> > at opening transaction.
> >
> > Should I use it after examination whether dirstate has been changed
> > since opening transaction ?
> 
> The current semantic is:
> 
>    I've new content, here is how to generated it on disk.
> 
> So it should be called by the code updating dirstate to "commit" the 
> change. It should replace (or be contained in) dirstate.write() calls.
> 
> (we should probably update the docstring to make it clean)

I see.


> -- 
> Pierre-Yves David
> 

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Katsunori FUJIWARA - Aug. 13, 2015, 1:18 p.m.
At Thu, 13 Aug 2015 12:17:22 +0900,
FUJIWARA Katsunori wrote:
> 
> 
> At Sat, 18 Jul 2015 01:48:24 +0200,
> Pierre-Yves David wrote:
> > 
> > On 07/16/2015 04:43 PM, FUJIWARA Katsunori wrote:
> > >
> > > At Tue, 14 Jul 2015 13:26:14 +0200,
> > > Pierre-Yves David wrote:
> > >>
> > >> On 07/13/2015 07:44 AM, FUJIWARA Katsunori wrote:
> > >>>
> > >>>
> > >>> At Sat, 11 Jul 2015 20:47:00 +0100,
> > >>> Pierre-Yves David wrote:
> > >>>
> > >>>> On 07/11/2015 11:31 AM, FUJIWARA Katsunori wrote:
> > >>>>> Now, I'm working for dirstate consistency around inside and border of
> > >>>>> transaction, and planning to use this abort callback mechanism to know
> > >>>>> the end of transaction at failure.
> > >>>>
> > >>>> I'm a bit curious about what code you need to run in such case. If you
> > >>>> use the file generator mechanism you do not need to worries about
> > >>>> restoring backup.
> > >>>>
> > >>>> However, I'm not sure how rolling back the file would atomatically
> > >>>> invalide the in memory object content (probably not). Is this why you
> > >>>> need a callback? could we (would it make sense to) integrate that as
> > >>>> part of one of our current cache invalidation mechanism?
> > >>>
> > >>>
> > >>> I'm planning to use callback mechanism for purposes below:
> > >>>
> > >>> 1. to make existing 'dirstate.write()' invocations write in-memory
> > >>>      changes into appropriate file ('.hg/dirstate' or '.hg/dirstate.pending'),
> > >>>      'dirstate.write()' should know whether transaction is running or not
> > >>>
> > >>>      I'm planning changes below for this purpose:
> > >>>
> > >>>      - invoke (newly added) 'dirstate.begintransaction()' at the
> > >>>        beginning of transaction in 'localrepo.transaction()'
> > >>>
> > >>>      - invoke (newly added) 'dirstate.endtransaction()' at the end of
> > >>>        transaction via "the end of transaction" callback (instead of
> > >>>        finalize/abort callbacks)
> > >>>
> > >>>
> > >>>      There are some other ways:
> > >>>
> > >>>      1-1. instantiate dirstate with 'repo'
> > >>>           (or weak ref of repo to avoid circular reference ?)
> > >>>
> > >>>      1-2. make 'dirstate.write()' require 'localrepo' argument
> > >>>
> > >>>           This seems too large impact (especially for 3rd party
> > >>>           extensions, even though Mercurial doesn't ensure compatibility
> > >>>           of internal API at all :-))
> > >>>
> > >>>      In both cases, 'dirstate.write()' can determine appropriate output
> > >>>      file by examination of 'repo.currenttransaction()'.
> > >>
> > >> I would prefer one of this options. Using hooks for such core code looks
> > >> a bit hacky and fragile.
> > >
> > > How about something new like 'unlock' of 'lock' (or 'after' of
> > > transaction) ? Would you think this kind of 'call back' mechanism
> > > itself is also fragile for core code (if newly added) ?
> > 
> > I do not understand what you mean.
> > 
> > I'm worried about hook approach being "fragile" because it dirstate 
> > defines very important data for the understanding of the repo. Having it 
> > a the same place as other extensions and config hook will probably lead 
> > to some ordering hell.
> 
> (Oops, 'unlock' is just a name of a local variable in
                                      ^^^^^^^^^^^^^^
> 'localrepo.lock()'.  Correct name of constructor argument is
> 'releasefn')

"local variable" => "local function"

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Pierre-Yves David - Aug. 17, 2015, 10:16 p.m.
On 08/12/2015 08:17 PM, FUJIWARA Katsunori wrote:
>
> At Sat, 18 Jul 2015 01:48:24 +0200,
> Pierre-Yves David wrote:
>>
>> On 07/16/2015 04:43 PM, FUJIWARA Katsunori wrote:
>>>
>>> At Tue, 14 Jul 2015 13:26:14 +0200,
>>> Pierre-Yves David wrote:
>>>>
>>>> On 07/13/2015 07:44 AM, FUJIWARA Katsunori wrote:
>>>>>
>>>>>
>>>>> At Sat, 11 Jul 2015 20:47:00 +0100,
>>>>> Pierre-Yves David wrote:
>>>>>
>>>>>> On 07/11/2015 11:31 AM, FUJIWARA Katsunori wrote:
>>>>>>> Now, I'm working for dirstate consistency around inside and border of
>>>>>>> transaction, and planning to use this abort callback mechanism to know
>>>>>>> the end of transaction at failure.
>>>>>>
>>>>>> I'm a bit curious about what code you need to run in such case. If you
>>>>>> use the file generator mechanism you do not need to worries about
>>>>>> restoring backup.
>>>>>>
>>>>>> However, I'm not sure how rolling back the file would atomatically
>>>>>> invalide the in memory object content (probably not). Is this why you
>>>>>> need a callback? could we (would it make sense to) integrate that as
>>>>>> part of one of our current cache invalidation mechanism?
>>>>>
>>>>>
>>>>> I'm planning to use callback mechanism for purposes below:
>>>>>
>>>>> 1. to make existing 'dirstate.write()' invocations write in-memory
>>>>>       changes into appropriate file ('.hg/dirstate' or '.hg/dirstate.pending'),
>>>>>       'dirstate.write()' should know whether transaction is running or not
>>>>>
>>>>>       I'm planning changes below for this purpose:
>>>>>
>>>>>       - invoke (newly added) 'dirstate.begintransaction()' at the
>>>>>         beginning of transaction in 'localrepo.transaction()'
>>>>>
>>>>>       - invoke (newly added) 'dirstate.endtransaction()' at the end of
>>>>>         transaction via "the end of transaction" callback (instead of
>>>>>         finalize/abort callbacks)
>>>>>
>>>>>
>>>>>       There are some other ways:
>>>>>
>>>>>       1-1. instantiate dirstate with 'repo'
>>>>>            (or weak ref of repo to avoid circular reference ?)
>>>>>
>>>>>       1-2. make 'dirstate.write()' require 'localrepo' argument
>>>>>
>>>>>            This seems too large impact (especially for 3rd party
>>>>>            extensions, even though Mercurial doesn't ensure compatibility
>>>>>            of internal API at all :-))
>>>>>
>>>>>       In both cases, 'dirstate.write()' can determine appropriate output
>>>>>       file by examination of 'repo.currenttransaction()'.
>>>>
>>>> I would prefer one of this options. Using hooks for such core code looks
>>>> a bit hacky and fragile.
>>>
>>> How about something new like 'unlock' of 'lock' (or 'after' of
>>> transaction) ? Would you think this kind of 'call back' mechanism
>>> itself is also fragile for core code (if newly added) ?
>>
>> I do not understand what you mean.
>>
>> I'm worried about hook approach being "fragile" because it dirstate
>> defines very important data for the understanding of the repo. Having it
>> a the same place as other extensions and config hook will probably lead
>> to some ordering hell.
>
> (Oops, 'unlock' is just a name of a local variable in
> 'localrepo.lock()'.  Correct name of constructor argument is
> 'releasefn')
>
> As you said, using 'hook' may lead to some ordering hell by other
> extensions and config hook.
>
> What we want is a kind of 'releasefn' callback for 'lock' class, which
> is called at the end of lock scope via invocation of own 'release()'
> (or via '__del__' of itself).
>
> 'transaction' constructor takes 'after' callback argument, but it is
> called only when transaction is successfully closing.
>
> To provide safe notification mechanism, which is isolated from hook
> infrastructure, how about newly introducing a kind of 'releasefn' for
> 'lock' into 'transaction' ?
>
> BTW, 'close'/'closing' are already used as a meaning of 'successful
> end of transaction' in many situations (e.g. name of hook). What name
> is suitable for the hook invoked at "the end of transaction scope" ?

I think "release" would be the right terms here. I'm however a bit lost 
about why we are diving so much into the hooking question. File 
generator should allow use to get most of what we need, doesn't it?

>> It seems like we should never start a transaction with a dirty dirstate.
>> Is there any theoritical reason for not doing it?
>
> I don't know any theoretical reason for not doing it, and this issue
> was fixed by 800e090e9c64.
>
>      https://selenic.com/repo/hg/rev/800e090e9c64

Good, let's enforce it then (or at least add a devel-warning.

>> Is there anything else than shelve relying on this. Having shelve
>> relying on a wrong/buggy behavior from core seems debatable.
>> It should not block us to fix core.
>
> AFAIK, only shelve relies on this wrong/buggy behavior.
>
> I confirmed that "explicitly rollbacking transaction just after
> committing it" can replace current problematic implementation in
> shelve.

Let's do that then.

>>> Let me confirm how dirstate should be treated at the border of
>>> transaction scope.
>>>
>>> - at opening transaction:
>>>
>>>     - write in-memory dirstate changes into '.hg/dirstate' (not yet)
>>>
>>> - at successfully closing transaction:
>>>
>>>     - write changes during transaction into '.hg/dirstate' (not yet)
>>>     - rename from '.hg/journal.dirstate' to '.hg/undo.dirstate' (OK)
>>>
>>> - at aborting transaction:
>>>
>>>     - restore '.hg/dirstate' from '.hg/journal.dirstat' (OK)
>>>     - discard in-memory dirstate changes (not yet)
>>
>> This summary looks perfect. Any objection/issue with doing this (beside
>> shelve abuse?)
>>
>>>>>>> But current implementation doesn't invoke abort callbacks, if there is
>>>>>>> no change on repository data at aborting transaction: e.g. "hg import"
>>>>>>> fails to import the first patch.
>>>>>>
>>>>>> The change to dirstate should most probably register himself as change
>>>>>> to the transaction. That would trigger it.
>>>>>
>>>>> I found two problems of 'addfilegenerator()' below. IMHO, the former
>>>>> is a major problem, and the latter is minor one (but you should
>>>>> dislike it as later :-)).
>>>>
>>>> If addfilegenerator fails to fit our needs here, we should probably
>>>> hammer it so it does.
>>>>
>>>>> And, both of my fixing plans cause omitting invocation of abort
>>>>> callbacks, because they makes 'transaction._backupentries' empty :-<
>>>>>
>>>>> - current 'addfilegenerator()' causes forcibly restoring
>>>>>      '.hg/dirstate' at rollback-ing, even if rollback-ing doesn't change
>>>>>      parents of the working directory
>>>>>
>>>>>      This causes unexpected "hg status" output at rollback-ing in such
>>>>>      situation (aka non-"parentgone" case).
>>>>
>>>> Do you mean there is category of change to dirstate that must not be
>>>> rolledback if transaction fails? I do not see data about this on the
>>>> wikipage (but I could be blind) Can you elaborate?
>>>>
>>>> I'm not sure to understand the issue here.
>>>
>>> "rollback" above is not one by aborting while transaction running, but
>>> one by "hg rollback"/"repo.rollback()" after once closing transaction.
>>>
>>> In the later case, parents of the working directory may differ
>>> from one at opening transaction, and forcible restoring dirstate
>>> causes unexpected result of subsequent "hg status".
>>>
>>> Current 'addfilegenerator()' mechanism records '.hg/dirstate' as to be
>>> restored at rollback, and it causes forcible restoring.
>>>
>>> 'repo.rollback()' avoids this problem by restoring dirstate by itself.
>>>
>>>       https://selenic.com/repo/hg/file/tip/mercurial/localrepo.py#l1110
>
> I put the link again but use concrete changeset ID instead of 'tip',
> for referring safely in the future :-)
>
>      https://selenic.com/repo/hg/file/a7527c5769bb/mercurial/localrepo.py#l1116
>> I'm not sure about what this section (of the email) is about. Look like
>> the linked code is close to what we need to invalidate the dirstate in
>> memory content. But its late so I'm giving up with this section. I think
>> we made good enough progress on the other ones.
>
> 'journal.dirstate' (or 'undo.dirstate' renamed from it) is used to
> restore dirstate (1) at aborting transaction, and (2) at "hg rollback"
> after committing transaction.


> In the former case, restoring dirstate from 'journal.dirstate' is
> always correct (no explanation is needed, isn't it ? :-))

I would says so (file state may be a bit fun, but…)


> In the latter case, the working directory may be updated to the
> revision other than one(s) committed in previous transaction.
>
> If NOT so (case 2-1), dirstate should be restored like case (1),
> because current parent revision will go away after rollbacking. This
> situation is called as "parentgone".
>
> But otherwise (case 2-2), dirstate shouldn't be restored from
> 'undo.dirstate', because this restoring is equal to unexpected "hg
> debugsetparent" (+ breaking file status) for users.

urg, I see.

> Here, current 'addfilegenerator()' implementation forcibly puts the
> specified file name into 'journal.backupfiles' (or 'undo.backupfiles'
> renamed from it) to restore changed files at aborting (or rollbacking)
> transaction.
>
> Then, using 'addfilegenerator()' always causes restoring dirstate via
> 'undo.backupfiles' at 'hg rollback' regardless of "parentgone" (2-1)
> or not (2-2), even though 'localrepo._rollback()' tries to avoid it in
> case (2-2). As described above, this is unexpected "hg debugsetparent"
> (+ breaking file status).

I see, this is unfortunate. Is there other working copy specific file we 
should be careful about too?

Thanks for the details!
Katsunori FUJIWARA - Aug. 19, 2015, 2:11 a.m.
At Mon, 17 Aug 2015 15:16:51 -0700,
Pierre-Yves David wrote:
> 
> On 08/12/2015 08:17 PM, FUJIWARA Katsunori wrote:
> >
> > At Sat, 18 Jul 2015 01:48:24 +0200,
> > Pierre-Yves David wrote:
> >>
> >> On 07/16/2015 04:43 PM, FUJIWARA Katsunori wrote:
> >>>
> >>> At Tue, 14 Jul 2015 13:26:14 +0200,
> >>> Pierre-Yves David wrote:
> >>>>
> >>>> On 07/13/2015 07:44 AM, FUJIWARA Katsunori wrote:
> >>>>>
> >>>>> At Sat, 11 Jul 2015 20:47:00 +0100,
> >>>>> Pierre-Yves David wrote:
> >>>>>
> >>>>>> On 07/11/2015 11:31 AM, FUJIWARA Katsunori wrote:
> >>>>>>
> >>>>>>> Now, I'm working for dirstate consistency around inside and border of
> >>>>>>> transaction, and planning to use this abort callback mechanism to know
> >>>>>>> the end of transaction at failure.
> >>>>>>
> >>>>>> I'm a bit curious about what code you need to run in such case. If you
> >>>>>> use the file generator mechanism you do not need to worries about
> >>>>>> restoring backup.
> >>>>>>
> >>>>>> However, I'm not sure how rolling back the file would atomatically
> >>>>>> invalide the in memory object content (probably not). Is this why you
> >>>>>> need a callback? could we (would it make sense to) integrate that as
> >>>>>> part of one of our current cache invalidation mechanism?
> >>>>>
> >>>>>
> >>>>> I'm planning to use callback mechanism for purposes below:
> >>>>>
> >>>>> 1. to make existing 'dirstate.write()' invocations write in-memory
> >>>>>       changes into appropriate file ('.hg/dirstate' or '.hg/dirstate.pending'),
> >>>>>       'dirstate.write()' should know whether transaction is running or not
> >>>>>
> >>>>>       I'm planning changes below for this purpose:
> >>>>>
> >>>>>       - invoke (newly added) 'dirstate.begintransaction()' at the
> >>>>>         beginning of transaction in 'localrepo.transaction()'
> >>>>>
> >>>>>       - invoke (newly added) 'dirstate.endtransaction()' at the end of
> >>>>>         transaction via "the end of transaction" callback (instead of
> >>>>>         finalize/abort callbacks)
> >>>>>
> >>>>>
> >>>>>       There are some other ways:
> >>>>>
> >>>>>       1-1. instantiate dirstate with 'repo'
> >>>>>            (or weak ref of repo to avoid circular reference ?)
> >>>>>
> >>>>>       1-2. make 'dirstate.write()' require 'localrepo' argument
> >>>>>
> >>>>>            This seems too large impact (especially for 3rd party
> >>>>>            extensions, even though Mercurial doesn't ensure compatibility
> >>>>>            of internal API at all :-))
> >>>>>
> >>>>>       In both cases, 'dirstate.write()' can determine appropriate output
> >>>>>       file by examination of 'repo.currenttransaction()'.
> >>>>
> >>>> I would prefer one of this options. Using hooks for such core code looks
> >>>> a bit hacky and fragile.
> >>>
> >>> How about something new like 'unlock' of 'lock' (or 'after' of
> >>> transaction) ? Would you think this kind of 'call back' mechanism
> >>> itself is also fragile for core code (if newly added) ?
> >>
> >> I do not understand what you mean.
> >>
> >> I'm worried about hook approach being "fragile" because it dirstate
> >> defines very important data for the understanding of the repo. Having it
> >> a the same place as other extensions and config hook will probably lead
> >> to some ordering hell.
> >
> > (Oops, 'unlock' is just a name of a local variable in
> > 'localrepo.lock()'.  Correct name of constructor argument is
> > 'releasefn')
> >
> > As you said, using 'hook' may lead to some ordering hell by other
> > extensions and config hook.
> >
> > What we want is a kind of 'releasefn' callback for 'lock' class, which
> > is called at the end of lock scope via invocation of own 'release()'
> > (or via '__del__' of itself).
> >
> > 'transaction' constructor takes 'after' callback argument, but it is
> > called only when transaction is successfully closing.
> >
> > To provide safe notification mechanism, which is isolated from hook
> > infrastructure, how about newly introducing a kind of 'releasefn' for
> > 'lock' into 'transaction' ?
> >
> > BTW, 'close'/'closing' are already used as a meaning of 'successful
> > end of transaction' in many situations (e.g. name of hook). What name
> > is suitable for the hook invoked at "the end of transaction scope" ?
> 
> I think "release" would be the right terms here. I'm however a bit lost 
> about why we are diving so much into the hooking question. File 
> generator should allow use to get most of what we need, doesn't it?

We need "the end of transaction scope" callback to discarded in-memory
dirstate changes at failure, because:

  - '.hg/dirstate' file itself is restored from '.hg/journal.dirstate'
    at failure of transaction as expected, but

  - in-memory dirstate changes aren't discarded even at failure of
    transaction, therefore

  - outer 'wlock.release()' causes writing such harmful in-memory
    changes out (in this case, in-memory changes may refer already
    stripped revisions)

In some cases, 'dirstateguard' avoids such unexpected writing dirstate
changes out outside transaction scope, but this should be done inside
transaction scope, IMHO.


> >> I'm not sure about what this section (of the email) is about. Look like
> >> the linked code is close to what we need to invalidate the dirstate in
> >> memory content. But its late so I'm giving up with this section. I think
> >> we made good enough progress on the other ones.
> >
> > 'journal.dirstate' (or 'undo.dirstate' renamed from it) is used to
> > restore dirstate (1) at aborting transaction, and (2) at "hg rollback"
> > after committing transaction.
> 
> 
> > In the former case, restoring dirstate from 'journal.dirstate' is
> > always correct (no explanation is needed, isn't it ? :-))
> 
> I would says so (file state may be a bit fun, but…)
> 
> 
> > In the latter case, the working directory may be updated to the
> > revision other than one(s) committed in previous transaction.
> >
> > If NOT so (case 2-1), dirstate should be restored like case (1),
> > because current parent revision will go away after rollbacking. This
> > situation is called as "parentgone".
> >
> > But otherwise (case 2-2), dirstate shouldn't be restored from
> > 'undo.dirstate', because this restoring is equal to unexpected "hg
> > debugsetparent" (+ breaking file status) for users.
> 
> urg, I see.
> 
> > Here, current 'addfilegenerator()' implementation forcibly puts the
> > specified file name into 'journal.backupfiles' (or 'undo.backupfiles'
> > renamed from it) to restore changed files at aborting (or rollbacking)
> > transaction.
> >
> > Then, using 'addfilegenerator()' always causes restoring dirstate via
> > 'undo.backupfiles' at 'hg rollback' regardless of "parentgone" (2-1)
> > or not (2-2), even though 'localrepo._rollback()' tries to avoid it in
> > case (2-2). As described above, this is unexpected "hg debugsetparent"
> > (+ breaking file status).
> 
> I see, this is unfortunate. Is there other working copy specific file we 
> should be careful about too?

AFAIK, there is no other one like that: there are some other working
copy sensitive files '.hg/branch' and '.hg/bookmarks.current', but
they aren't saved/restored by transaction.


> Thanks for the details!
> 
> -- 
> Pierre-Yves David
> 

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp

Patch

diff --git a/mercurial/transaction.py b/mercurial/transaction.py
--- a/mercurial/transaction.py
+++ b/mercurial/transaction.py
@@ -135,8 +135,10 @@  class transaction(object):
         # holds callback to call when writing the transaction
         self._finalizecallback = {}
         # hold callback for post transaction close
         self._postclosecallback = {}
+        # holds callbacks to call during abort
+        self._abortcallback = {}
 
     def __del__(self):
         if self.journal:
             self._abort()
@@ -360,8 +362,19 @@  class transaction(object):
         """
         self._postclosecallback[category] = callback
 
     @active
+    def addabort(self, category, callback):
+        """add a callback to be called when the transaction is aborted.
+
+        The transaction will be given as the first argument to the callback.
+
+        Category is a unique identifier to allow overwriting an old callback
+        with a newer callback.
+        """
+        self._abortcallback[category] = callback
+
+    @active
     def close(self):
         '''commit the transaction'''
         if self.count == 1:
             self._generatefiles()
@@ -442,8 +455,10 @@  class transaction(object):
 
             self.report(_("transaction abort!\n"))
 
             try:
+                for cat in sorted(self._abortcallback):
+                    self._abortcallback[cat](self)
                 _playback(self.journal, self.report, self.opener, self._vfsmap,
                           self.entries, self._backupentries, False)
                 self.report(_("rollback completed\n"))
             except Exception: