Patchwork [1,of,9,V3] cmdutil: add the class to restore dirstate at unexpected failure easily

login
register
mail settings
Submitter Katsunori FUJIWARA
Date May 7, 2015, 3:15 a.m.
Message ID <449a46109f0c96bb8d82.1430968548@feefifofum>
Download mbox | patch
Permalink /patch/8936/
State Accepted
Delegated to: Pierre-Yves David
Headers show

Comments

Katsunori FUJIWARA - May 7, 2015, 3:15 a.m.
# HG changeset patch
# User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
# Date 1430968030 -32400
#      Thu May 07 12:07:10 2015 +0900
# Node ID 449a46109f0c96bb8d82d4edf2c8855341558c2f
# Parent  8174d27576a3fff28fb6f952a4c89d5c0b900214
cmdutil: add the class to restore dirstate at unexpected failure easily

Before this patch, after "dirstate.write()" execution, there is no way
to restore dirstate to the original status before "dirstate.write()".

In some code paths, "dirstate.invalidate()" is used as a kind of
"restore .hg/dirstate to the original status". But it just avoids
writing changes in memory out, and doesn't actually restore
".hg/dirstate" file. "dirstate.write()" prevents it from working as
expected.

To fix the issue that recent (in memory) dirstate isn't visible to
external process (e.g. "precommit" hooks), "dirstate.write()" should
be invoked before invocation of external process. But at the same
time, ".hg/dirstate" should be restored to the status before
"dirstate.write()" at unexpected failure in some cases.

This patch adds the class "dirstateguard" to easily restore
".hg/dirstate" at unexpected failure. Typical usecase of it is:

    # (1) build dirstate up
    ....

    # (2) write dirstate out, and backup ".hg/dirstate"
    dsguard = dirstateguard(repo, 'scopename')
    try:
        # (3) execute somethig to do:
        #     this may imply making some additional changes on dirstate
        ....

        # (4) unlink backup-ed dirstate file at the end of dsguard scope
        dsguard.close()
    finally:
        # (5) if execution is aborted before "dsguard.close()",
        #     ".hg/dirstate" is restored from the backup
        dsguard.release()

For this kind of issue, "extending transaction" approach (in
https://titanpad.com/mercurial32-sprint) seems not to be suitable,
because:

  - transaction nesting occurs in some cases (e.g. "shelve => rebase"), and

  - "dirstate" may be already modified since the beginning of OUTER
    transaction scope, then

  - dirstate should be backed up into the file other than
    "dirstate.journal" at the beginning of INNER transaction scope, but

  - such alternative backup files are useless for transaction itself,
    and increases complication of its implementation

"transaction" and "dirstateguard" differ from each other also in "what
it should do for .hg/dirstate" in cases other than success.

  ============== ======= ======== =============
  type           success fail     "hg rollback"
  ============== ======= ======== =============
  transaction    keep     keep     restore
  dirstateguard  keep     restore  (not implied)
  ============== ======= ======== =============
Pierre-Yves David - May 9, 2015, 2 a.m.
On 05/06/2015 08:15 PM, FUJIWARA Katsunori wrote:
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> # Date 1430968030 -32400
> #      Thu May 07 12:07:10 2015 +0900
> # Node ID 449a46109f0c96bb8d82d4edf2c8855341558c2f
> # Parent  8174d27576a3fff28fb6f952a4c89d5c0b900214
> cmdutil: add the class to restore dirstate at unexpected failure easily
>
> Before this patch, after "dirstate.write()" execution, there is no way
> to restore dirstate to the original status before "dirstate.write()".
>
> In some code paths, "dirstate.invalidate()" is used as a kind of
> "restore .hg/dirstate to the original status". But it just avoids
> writing changes in memory out, and doesn't actually restore
> ".hg/dirstate" file. "dirstate.write()" prevents it from working as
> expected.
>
> To fix the issue that recent (in memory) dirstate isn't visible to
> external process (e.g. "precommit" hooks), "dirstate.write()" should
> be invoked before invocation of external process. But at the same
> time, ".hg/dirstate" should be restored to the status before
> "dirstate.write()" at unexpected failure in some cases.
>
> This patch adds the class "dirstateguard" to easily restore
> ".hg/dirstate" at unexpected failure. Typical usecase of it is:
>
>      # (1) build dirstate up
>      ....
>
>      # (2) write dirstate out, and backup ".hg/dirstate"
>      dsguard = dirstateguard(repo, 'scopename')
>      try:
>          # (3) execute somethig to do:
>          #     this may imply making some additional changes on dirstate
>          ....
>
>          # (4) unlink backup-ed dirstate file at the end of dsguard scope
>          dsguard.close()
>      finally:
>          # (5) if execution is aborted before "dsguard.close()",
>          #     ".hg/dirstate" is restored from the backup
>          dsguard.release()

This "invocation" before external process, sounded really like the 
"pending mechanism" used in transaction. Things are a bit different 
since the dirstate also refers to content of the working copy, that have 
been already flushed to disk. But as far as I understand, the dirstate 
refers to parents that may be only contained in the the transaction.

The real question here is: What should external writer see during the 
process.

If it make more sense for external writer to be able to have a peek at 
the dirstate, we should write it.
If external writer cannot understand what is in the dirstate data, it 
should not be written to disk until the transaction is completed, and 
using a `.pending` mechanism instead.

> For this kind of issue, "extending transaction" approach (in
> https://titanpad.com/mercurial32-sprint) seems not to be suitable,
> because:
>
>    - transaction nesting occurs in some cases (e.g. "shelve => rebase"), and
>
>    - "dirstate" may be already modified since the beginning of OUTER
>      transaction scope, then
>
>    - dirstate should be backed up into the file other than
>      "dirstate.journal" at the beginning of INNER transaction scope, but
>
>    - such alternative backup files are useless for transaction itself,
>      and increases complication of its implementation

Some part about the transaction confuses me.

  - Yes, you are right, the dirstate changes may have a wider life spawn 
than the transaction.

  - Therefor doing backup there is not suitable (and is maybe never 
useful) (your are still right there)

  - However, for any changes started inside a transaction, we should 
never write data in .hg/dirstate until the transaction is actually 
committed (because it likely refers to unwritten data in the 
transaction). This means that hooks/editors/etc would need to access a 
.hg/distate.pending in that case.

  - The transaction API allow for both "wait until transaction is closed 
to flush that file" logic and "please write pending data for hooks" 
logic. It looks like the dirstate handling should collaborate with the 
transaction if one exist when dirstate manipulation are done. I'm not 
sure how much it happen.



Another things that concerns me is "re-entrance", you code use a 
distinct object for each scope, using its own backup and everything.
This seems simple enough, provide nesting and work probably well. 
However your backup is based on the "name" provided in the code. This 
mean that two nested calls with the same "name" will overwrite each 
other. We should throw some unique identifier in there. `id(self)` would do.


This re-entrance business transition perfectly to my last question: It 
seems like this superseed the beginparentchange/endparentchange system 
previously in place. However, there is still multiple call to this 
system. Could we, in theory replace them will with the new system? If 
not, why?

Also, this old system was living on the dirstate object directly. would 
it make sense to move the guardian business there?

Summary of questions/topics
===========================

- Do we need a pending mechanism ?
- It seems like it would make sense to attach the logic to the 
transaction if one exists. Though?
- Possible "name" collision and necessity to make identifier unique.
- What's the fate of beginparentchange

Beside the question aboves (and some typo I can fix in-flight), the 
series looks like an improvement to me. I'll take it once the above 
point are clarified.

> +class dirstateguard(object):
> +    '''Restore dirstate at unexpected failure.
> +
> +    At the construction, this class does:
> +
> +    - write current ``repo.dirstate`` out, and
> +    - save ``.hg/dirstate`` into the backup file
> +
> +    This restores ``.hg/dirstate`` from backup file, if ``release()``
> +    is invoked before ``close()``.
> +
> +    This just removes the backup file at ``close()`` before ``release()``.
> +    '''
> +
> +    def __init__(self, repo, name):
> +        repo.dirstate.write()
> +        self.repo = repo
> +        self.name = 'dirstate.backup.%s' % name

this "name" is actually a filename, would we rename that into 'filename'?
(could be a follow up)

> +        repo.vfs.write(self.name, repo.vfs.tryread('dirstate'))
> +        self.active = True
> +        self.closed = False

also: Could we have all the attributes marked private as private
(could be a follow up)
Katsunori FUJIWARA - May 11, 2015, 3:49 p.m.
At Fri, 08 May 2015 19:00:02 -0700,
Pierre-Yves David wrote:
> 
> 
> 
> On 05/06/2015 08:15 PM, FUJIWARA Katsunori wrote:
> > # HG changeset patch
> > # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> > # Date 1430968030 -32400
> > #      Thu May 07 12:07:10 2015 +0900
> > # Node ID 449a46109f0c96bb8d82d4edf2c8855341558c2f
> > # Parent  8174d27576a3fff28fb6f952a4c89d5c0b900214
> > cmdutil: add the class to restore dirstate at unexpected failure easily
> >
> > Before this patch, after "dirstate.write()" execution, there is no way
> > to restore dirstate to the original status before "dirstate.write()".
> >
> > In some code paths, "dirstate.invalidate()" is used as a kind of
> > "restore .hg/dirstate to the original status". But it just avoids
> > writing changes in memory out, and doesn't actually restore
> > ".hg/dirstate" file. "dirstate.write()" prevents it from working as
> > expected.
> >
> > To fix the issue that recent (in memory) dirstate isn't visible to
> > external process (e.g. "precommit" hooks), "dirstate.write()" should
> > be invoked before invocation of external process. But at the same
> > time, ".hg/dirstate" should be restored to the status before
> > "dirstate.write()" at unexpected failure in some cases.
> >
> > This patch adds the class "dirstateguard" to easily restore
> > ".hg/dirstate" at unexpected failure. Typical usecase of it is:
> >
> >      # (1) build dirstate up
> >      ....
> >
> >      # (2) write dirstate out, and backup ".hg/dirstate"
> >      dsguard = dirstateguard(repo, 'scopename')
> >      try:
> >          # (3) execute somethig to do:
> >          #     this may imply making some additional changes on dirstate
> >          ....
> >
> >          # (4) unlink backup-ed dirstate file at the end of dsguard scope
> >          dsguard.close()
> >      finally:
> >          # (5) if execution is aborted before "dsguard.close()",
> >          #     ".hg/dirstate" is restored from the backup
> >          dsguard.release()
> 
> This "invocation" before external process, sounded really like the 
> "pending mechanism" used in transaction. Things are a bit different 
> since the dirstate also refers to content of the working copy, that have 
> been already flushed to disk. But as far as I understand, the dirstate 
> refers to parents that may be only contained in the the transaction.
> 
> The real question here is: What should external writer see during the 
> process.
> 
> If it make more sense for external writer to be able to have a peek at 
> the dirstate, we should write it.
> If external writer cannot understand what is in the dirstate data, it 
> should not be written to disk until the transaction is completed, and 
> using a `.pending` mechanism instead.
>
>
> > For this kind of issue, "extending transaction" approach (in
> > https://titanpad.com/mercurial32-sprint) seems not to be suitable,
> > because:
> >
> >    - transaction nesting occurs in some cases (e.g. "shelve => rebase"), and
> >
> >    - "dirstate" may be already modified since the beginning of OUTER
> >      transaction scope, then
> >
> >    - dirstate should be backed up into the file other than
> >      "dirstate.journal" at the beginning of INNER transaction scope, but
> >
> >    - such alternative backup files are useless for transaction itself,
> >      and increases complication of its implementation
> 
> Some part about the transaction confuses me.
> 
>   - Yes, you are right, the dirstate changes may have a wider life spawn 
> than the transaction.
> 
>   - Therefor doing backup there is not suitable (and is maybe never 
> useful) (your are still right there)
> 
>   - However, for any changes started inside a transaction, we should 
> never write data in .hg/dirstate until the transaction is actually 
> committed (because it likely refers to unwritten data in the 
> transaction). This means that hooks/editors/etc would need to access a 
> .hg/distate.pending in that case.
> 
>   - The transaction API allow for both "wait until transaction is closed 
> to flush that file" logic and "please write pending data for hooks" 
> logic. It looks like the dirstate handling should collaborate with the 
> transaction if one exist when dirstate manipulation are done. I'm not 
> sure how much it happen.

Committing via "repo.commit()" may cause invoking editor process out
of transaction scope, but flushing dirstate is still needed: for
example, "hg commit -A" changes status of each unknown files before
"repo.commit()".

`.pending` mechanism of transaction itself can't be used directly in
such cases.

But on the other hand, editor process may be invoked in transaction
scope, and (maybe) pending changelog and so on should become visible
for external process: for example, "hg import -e" with multiple
patches should cause such situation (BTW, rebasing multiple revisions
creates a transaction for each revisions to be rebased).

So, I'll try to follow the style of `.pending` mechanism of
transaction for dirstate.



> Another things that concerns me is "re-entrance", you code use a 
> distinct object for each scope, using its own backup and everything.
> This seems simple enough, provide nesting and work probably well. 
> However your backup is based on the "name" provided in the code. This 
> mean that two nested calls with the same "name" will overwrite each 
> other. We should throw some unique identifier in there. `id(self)` would do.

I don't have any reasons to insist using "name" for identifier. I'll
revise around it. BTW, what about using both "name" and `id(self)` for
trouble shooting/debugging ?


> This re-entrance business transition perfectly to my last question: It 
> seems like this superseed the beginparentchange/endparentchange system 
> previously in place. However, there is still multiple call to this 
> system. Could we, in theory replace them will with the new system? If 
> not, why?

begin/end-parentchange system seems handy for small (a few lines)
scope changes, because it doesn't need writing current dirstate out
and backing it up.

For example, current "heavy" dirstateguard doesn't seem suitable for
"committablectx.markcommitted()".

    def markcommitted(self, node):
        self._repo.dirstate.beginparentchange()
        for f in self.modified() + self.added():
            self._repo.dirstate.normal(f)
        for f in self.removed():
            self._repo.dirstate.drop(f)
        self._repo.dirstate.setparents(node)
        self._repo.dirstate.endparentchange()

For such lightweight use, we may have to add the flag to
enable/disable "writing current dirstate out and backing it up" to
dirstateguard: when backup is disabled, "dirstate.write()" may have to
be blocked for safety.

On the other hand, begin/end-parentchange system doesn't seem suitable
for non-small scope or complicated changes, because it is difficult to
ensure that "dirstate.write()" isn't implied in such case (like ones
in mq, commands.import and cmdutil.tryimportone replaced by this
series).

begin/end-parentchange in this condition should be replaced by
dirstateguard or so, as soon as possible.


> Also, this old system was living on the dirstate object directly. would 
> it make sense to move the guardian business there?

IMHO, yes, even though some more preparations may be needed to
complete moving.


> Summary of questions/topics
> ===========================
> 
> - Do we need a pending mechanism ?
> - It seems like it would make sense to attach the logic to the 
> transaction if one exists. Though?
> - Possible "name" collision and necessity to make identifier unique.
> - What's the fate of beginparentchange
> 
> Beside the question aboves (and some typo I can fix in-flight), the 
> series looks like an improvement to me. I'll take it once the above 
> point are clarified.
> 
> > +class dirstateguard(object):
> > +    '''Restore dirstate at unexpected failure.
> > +
> > +    At the construction, this class does:
> > +
> > +    - write current ``repo.dirstate`` out, and
> > +    - save ``.hg/dirstate`` into the backup file
> > +
> > +    This restores ``.hg/dirstate`` from backup file, if ``release()``
> > +    is invoked before ``close()``.
> > +
> > +    This just removes the backup file at ``close()`` before ``release()``.
> > +    '''
> > +
> > +    def __init__(self, repo, name):
> > +        repo.dirstate.write()
> > +        self.repo = repo
> > +        self.name = 'dirstate.backup.%s' % name
> 
> this "name" is actually a filename, would we rename that into 'filename'?
> (could be a follow up)

I'll revise so.

> > +        repo.vfs.write(self.name, repo.vfs.tryread('dirstate'))
> > +        self.active = True
> > +        self.closed = False
> 
> also: Could we have all the attributes marked private as private
> (could be a follow up)

I'll revise so.

 
> -- 
> Pierre-Yves David
> 

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Pierre-Yves David - May 12, 2015, 12:49 a.m.
On 05/11/2015 08:49 AM, FUJIWARA Katsunori wrote:
>
> At Fri, 08 May 2015 19:00:02 -0700,
> Pierre-Yves David wrote:
>>
>>
>>
>> On 05/06/2015 08:15 PM, FUJIWARA Katsunori wrote:
>>> # HG changeset patch
>>> # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
>>> # Date 1430968030 -32400
>>> #      Thu May 07 12:07:10 2015 +0900
>>> # Node ID 449a46109f0c96bb8d82d4edf2c8855341558c2f
>>> # Parent  8174d27576a3fff28fb6f952a4c89d5c0b900214
>>> cmdutil: add the class to restore dirstate at unexpected failure easily
>>>
>>> Before this patch, after "dirstate.write()" execution, there is no way
>>> to restore dirstate to the original status before "dirstate.write()".
>>>
>>> In some code paths, "dirstate.invalidate()" is used as a kind of
>>> "restore .hg/dirstate to the original status". But it just avoids
>>> writing changes in memory out, and doesn't actually restore
>>> ".hg/dirstate" file. "dirstate.write()" prevents it from working as
>>> expected.
>>>
>>> To fix the issue that recent (in memory) dirstate isn't visible to
>>> external process (e.g. "precommit" hooks), "dirstate.write()" should
>>> be invoked before invocation of external process. But at the same
>>> time, ".hg/dirstate" should be restored to the status before
>>> "dirstate.write()" at unexpected failure in some cases.
>>>
>>> This patch adds the class "dirstateguard" to easily restore
>>> ".hg/dirstate" at unexpected failure. Typical usecase of it is:
>>>
>>>       # (1) build dirstate up
>>>       ....
>>>
>>>       # (2) write dirstate out, and backup ".hg/dirstate"
>>>       dsguard = dirstateguard(repo, 'scopename')
>>>       try:
>>>           # (3) execute somethig to do:
>>>           #     this may imply making some additional changes on dirstate
>>>           ....
>>>
>>>           # (4) unlink backup-ed dirstate file at the end of dsguard scope
>>>           dsguard.close()
>>>       finally:
>>>           # (5) if execution is aborted before "dsguard.close()",
>>>           #     ".hg/dirstate" is restored from the backup
>>>           dsguard.release()
>>
>> This "invocation" before external process, sounded really like the
>> "pending mechanism" used in transaction. Things are a bit different
>> since the dirstate also refers to content of the working copy, that have
>> been already flushed to disk. But as far as I understand, the dirstate
>> refers to parents that may be only contained in the the transaction.
>>
>> The real question here is: What should external writer see during the
>> process.
>>
>> If it make more sense for external writer to be able to have a peek at
>> the dirstate, we should write it.
>> If external writer cannot understand what is in the dirstate data, it
>> should not be written to disk until the transaction is completed, and
>> using a `.pending` mechanism instead.
>>
>>
>>> For this kind of issue, "extending transaction" approach (in
>>> https://titanpad.com/mercurial32-sprint) seems not to be suitable,
>>> because:
>>>
>>>     - transaction nesting occurs in some cases (e.g. "shelve => rebase"), and
>>>
>>>     - "dirstate" may be already modified since the beginning of OUTER
>>>       transaction scope, then
>>>
>>>     - dirstate should be backed up into the file other than
>>>       "dirstate.journal" at the beginning of INNER transaction scope, but
>>>
>>>     - such alternative backup files are useless for transaction itself,
>>>       and increases complication of its implementation
>>
>> Some part about the transaction confuses me.
>>
>>    - Yes, you are right, the dirstate changes may have a wider life spawn
>> than the transaction.
>>
>>    - Therefor doing backup there is not suitable (and is maybe never
>> useful) (your are still right there)
>>
>>    - However, for any changes started inside a transaction, we should
>> never write data in .hg/dirstate until the transaction is actually
>> committed (because it likely refers to unwritten data in the
>> transaction). This means that hooks/editors/etc would need to access a
>> .hg/distate.pending in that case.
>>
>>    - The transaction API allow for both "wait until transaction is closed
>> to flush that file" logic and "please write pending data for hooks"
>> logic. It looks like the dirstate handling should collaborate with the
>> transaction if one exist when dirstate manipulation are done. I'm not
>> sure how much it happen.
>
> Committing via "repo.commit()" may cause invoking editor process out
> of transaction scope, but flushing dirstate is still needed: for
> example, "hg commit -A" changes status of each unknown files before
> "repo.commit()".
>
> `.pending` mechanism of transaction itself can't be used directly in
> such cases.
>
> But on the other hand, editor process may be invoked in transaction
> scope, and (maybe) pending changelog and so on should become visible
> for external process: for example, "hg import -e" with multiple
> patches should cause such situation (BTW, rebasing multiple revisions
> creates a transaction for each revisions to be rebased).
>
> So, I'll try to follow the style of `.pending` mechanism of
> transaction for dirstate.

The way transaction works for that (roughly) is:

- write all files with a '.pending' suffix,
- set an environment variable HG_PENDING=<repo-root>
- Mercurial process during hooks detect the HG_PENDING vars and read the 
.pending file if they exists.

As you point out, there is case where the dirstate writing is not done 
during a transaction (eg: update). And in this case, we should -not- 
collaborate with the transaction and we should -not- use 
dirstate.pending file.

However, if we are in the middle of a transaction, the new dirstate will 
likely point to content of the transaction and we should rely on the 
transaction to generate a '.pending' file when needed (also meaning, we 
should teach dirstate to read them) and to flush data to disk when the 
the transaction is committed (see the file generator API, taking care of 
both).

(you can use `localrepo.currenttransaction()` to check if a transaction 
is open and if we jump onboard).

All that said, current dirstate implementation is not collaborating with 
transaction either. So your changes are an improvement. I'll take your 
series as is (with a couple of typo, and name fixed) and will hope for 
more follow up on this topic.

>> Another things that concerns me is "re-entrance", you code use a
>> distinct object for each scope, using its own backup and everything.
>> This seems simple enough, provide nesting and work probably well.
>> However your backup is based on the "name" provided in the code. This
>> mean that two nested calls with the same "name" will overwrite each
>> other. We should throw some unique identifier in there. `id(self)` would do.
>
> I don't have any reasons to insist using "name" for identifier. I'll
> revise around it. BTW, what about using both "name" and `id(self)` for
> trouble shooting/debugging ?

This sounds like a plan, I'll fix it in-flight.

>> This re-entrance business transition perfectly to my last question: It
>> seems like this superseed the beginparentchange/endparentchange system
>> previously in place. However, there is still multiple call to this
>> system. Could we, in theory replace them will with the new system? If
>> not, why?
>
> begin/end-parentchange system seems handy for small (a few lines)
> scope changes, because it doesn't need writing current dirstate out
> and backing it up.
>
> For example, current "heavy" dirstateguard doesn't seem suitable for
> "committablectx.markcommitted()".
>
>      def markcommitted(self, node):
>          self._repo.dirstate.beginparentchange()
>          for f in self.modified() + self.added():
>              self._repo.dirstate.normal(f)
>          for f in self.removed():
>              self._repo.dirstate.drop(f)
>          self._repo.dirstate.setparents(node)
>          self._repo.dirstate.endparentchange()
>
> For such lightweight use, we may have to add the flag to
> enable/disable "writing current dirstate out and backing it up" to
> dirstateguard: when backup is disabled, "dirstate.write()" may have to
> be blocked for safety.
>
> On the other hand, begin/end-parentchange system doesn't seem suitable
> for non-small scope or complicated changes, because it is difficult to
> ensure that "dirstate.write()" isn't implied in such case (like ones
> in mq, commands.import and cmdutil.tryimportone replaced by this
> series).
>
> begin/end-parentchange in this condition should be replaced by
> dirstateguard or so, as soon as possible.

I think we should have a single mechanism of people will use it wrong. 
We should also makes it mandatory (or at least issue devel-warning) to 
get people to use it.

Can we flag the old system for deletion and hunt down all it's current user?

>> Also, this old system was living on the dirstate object directly. would
>> it make sense to move the guardian business there?
>
> IMHO, yes, even though some more preparations may be needed to
> complete moving.

Could be done in a follow up.

>> Summary of questions/topics
>> ===========================
>>
>> - Do we need a pending mechanism ?
>> - It seems like it would make sense to attach the logic to the
>> transaction if one exists. Though?
>> - Possible "name" collision and necessity to make identifier unique.
>> - What's the fate of beginparentchange
>>
>> Beside the question aboves (and some typo I can fix in-flight), the
>> series looks like an improvement to me. I'll take it once the above
>> point are clarified.
>>
>>> +class dirstateguard(object):
>>> +    '''Restore dirstate at unexpected failure.
>>> +
>>> +    At the construction, this class does:
>>> +
>>> +    - write current ``repo.dirstate`` out, and
>>> +    - save ``.hg/dirstate`` into the backup file
>>> +
>>> +    This restores ``.hg/dirstate`` from backup file, if ``release()``
>>> +    is invoked before ``close()``.
>>> +
>>> +    This just removes the backup file at ``close()`` before ``release()``.
>>> +    '''
>>> +
>>> +    def __init__(self, repo, name):
>>> +        repo.dirstate.write()
>>> +        self.repo = repo
>>> +        self.name = 'dirstate.backup.%s' % name
>>
>> this "name" is actually a filename, would we rename that into 'filename'?
>> (could be a follow up)
>
> I'll revise so.

I'll apply that inflight

>
>>> +        repo.vfs.write(self.name, repo.vfs.tryread('dirstate'))
>>> +        self.active = True
>>> +        self.closed = False
>>
>> also: Could we have all the attributes marked private as private
>> (could be a follow up)
>
> I'll revise so.

I'll apply that inflight
Katsunori FUJIWARA - May 12, 2015, 12:54 p.m.
At Mon, 11 May 2015 17:49:43 -0700,
Pierre-Yves David wrote:
> 
> On 05/11/2015 08:49 AM, FUJIWARA Katsunori wrote:
> >
> > At Fri, 08 May 2015 19:00:02 -0700,
> > Pierre-Yves David wrote:
> >>
> >> On 05/06/2015 08:15 PM, FUJIWARA Katsunori wrote:
> >>> # HG changeset patch
> >>> # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> >>> # Date 1430968030 -32400
> >>> #      Thu May 07 12:07:10 2015 +0900
> >>> # Node ID 449a46109f0c96bb8d82d4edf2c8855341558c2f
> >>> # Parent  8174d27576a3fff28fb6f952a4c89d5c0b900214
> >>> cmdutil: add the class to restore dirstate at unexpected failure easily

[snip]

> >>> For this kind of issue, "extending transaction" approach (in
> >>> https://titanpad.com/mercurial32-sprint) seems not to be suitable,
> >>> because:
> >>>
> >>>     - transaction nesting occurs in some cases (e.g. "shelve => rebase"), and
> >>>
> >>>     - "dirstate" may be already modified since the beginning of OUTER
> >>>       transaction scope, then
> >>>
> >>>     - dirstate should be backed up into the file other than
> >>>       "dirstate.journal" at the beginning of INNER transaction scope, but
> >>>
> >>>     - such alternative backup files are useless for transaction itself,
> >>>       and increases complication of its implementation
> >>
> >> Some part about the transaction confuses me.
> >>
> >>    - Yes, you are right, the dirstate changes may have a wider life spawn
> >> than the transaction.
> >>
> >>    - Therefor doing backup there is not suitable (and is maybe never
> >> useful) (your are still right there)
> >>
> >>    - However, for any changes started inside a transaction, we should
> >> never write data in .hg/dirstate until the transaction is actually
> >> committed (because it likely refers to unwritten data in the
> >> transaction). This means that hooks/editors/etc would need to access a
> >> .hg/distate.pending in that case.
> >>
> >>    - The transaction API allow for both "wait until transaction is closed
> >> to flush that file" logic and "please write pending data for hooks"
> >> logic. It looks like the dirstate handling should collaborate with the
> >> transaction if one exist when dirstate manipulation are done. I'm not
> >> sure how much it happen.
> >
> > Committing via "repo.commit()" may cause invoking editor process out
> > of transaction scope, but flushing dirstate is still needed: for
> > example, "hg commit -A" changes status of each unknown files before
> > "repo.commit()".
> >
> > `.pending` mechanism of transaction itself can't be used directly in
> > such cases.
> >
> > But on the other hand, editor process may be invoked in transaction
> > scope, and (maybe) pending changelog and so on should become visible
> > for external process: for example, "hg import -e" with multiple
> > patches should cause such situation (BTW, rebasing multiple revisions
> > creates a transaction for each revisions to be rebased).
> >
> > So, I'll try to follow the style of `.pending` mechanism of
> > transaction for dirstate.
> 
> The way transaction works for that (roughly) is:
> 
> - write all files with a '.pending' suffix,
> - set an environment variable HG_PENDING=<repo-root>
> - Mercurial process during hooks detect the HG_PENDING vars and read the 
> .pending file if they exists.
> 
> As you point out, there is case where the dirstate writing is not done 
> during a transaction (eg: update). And in this case, we should -not- 
> collaborate with the transaction and we should -not- use 
> dirstate.pending file.
> 
> However, if we are in the middle of a transaction, the new dirstate will 
> likely point to content of the transaction and we should rely on the 
> transaction to generate a '.pending' file when needed (also meaning, we 
> should teach dirstate to read them) and to flush data to disk when the 
> the transaction is committed (see the file generator API, taking care of 
> both).
> 
> (you can use `localrepo.currenttransaction()` to check if a transaction 
> is open and if we jump onboard).
> 
> All that said, current dirstate implementation is not collaborating with 
> transaction either. So your changes are an improvement. I'll take your 
> series as is (with a couple of typo, and name fixed) and will hope for 
> more follow up on this topic.

Thank you for pushing to clowncopter with in-flight fixing.

After working with '.pending' mechanism, I realize that writing
dirstate out should be done in "cmdutil.commitforceeditor()" instead
of in "repo.commit()" for '.pending' awareness.

Could you drop only #9 of this series from clowncopter ?

I'll soon post follow up series, which also takes care of ".pending"
mechanism (regardless of dropping #9).


> >> This re-entrance business transition perfectly to my last question: It
> >> seems like this superseed the beginparentchange/endparentchange system
> >> previously in place. However, there is still multiple call to this
> >> system. Could we, in theory replace them will with the new system? If
> >> not, why?
> >
> > begin/end-parentchange system seems handy for small (a few lines)
> > scope changes, because it doesn't need writing current dirstate out
> > and backing it up.
> >
> > For example, current "heavy" dirstateguard doesn't seem suitable for
> > "committablectx.markcommitted()".
> >
> >      def markcommitted(self, node):
> >          self._repo.dirstate.beginparentchange()
> >          for f in self.modified() + self.added():
> >              self._repo.dirstate.normal(f)
> >          for f in self.removed():
> >              self._repo.dirstate.drop(f)
> >          self._repo.dirstate.setparents(node)
> >          self._repo.dirstate.endparentchange()
> >
> > For such lightweight use, we may have to add the flag to
> > enable/disable "writing current dirstate out and backing it up" to
> > dirstateguard: when backup is disabled, "dirstate.write()" may have to
> > be blocked for safety.
> >
> > On the other hand, begin/end-parentchange system doesn't seem suitable
> > for non-small scope or complicated changes, because it is difficult to
> > ensure that "dirstate.write()" isn't implied in such case (like ones
> > in mq, commands.import and cmdutil.tryimportone replaced by this
> > series).
> >
> > begin/end-parentchange in this condition should be replaced by
> > dirstateguard or so, as soon as possible.
> 
> I think we should have a single mechanism of people will use it wrong. 
> We should also makes it mandatory (or at least issue devel-warning) to 
> get people to use it.
> 
> Can we flag the old system for deletion and hunt down all it's current user?

I'll try.

> >> Also, this old system was living on the dirstate object directly. would
> >> it make sense to move the guardian business there?
> >
> > IMHO, yes, even though some more preparations may be needed to
> > complete moving.
> 
> Could be done in a follow up.

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Pierre-Yves David - May 19, 2015, 5:56 a.m.
On 05/12/2015 07:54 AM, FUJIWARA Katsunori wrote:
> After working with '.pending' mechanism, I realize that writing
> dirstate out should be done in "cmdutil.commitforceeditor()" instead
> of in "repo.commit()" for '.pending' awareness.
>
> Could you drop only #9 of this series from clowncopter ?

What's the status on this? I would be happy to see issue 4378 fixed and 
we seemed to be only a few centimeters away.
Katsunori FUJIWARA - May 19, 2015, 2:31 p.m.
At Tue, 19 May 2015 00:56:06 -0500,
Pierre-Yves David wrote:
> 
> 
> 
> On 05/12/2015 07:54 AM, FUJIWARA Katsunori wrote:
> > After working with '.pending' mechanism, I realize that writing
> > dirstate out should be done in "cmdutil.commitforceeditor()" instead
> > of in "repo.commit()" for '.pending' awareness.
> >
> > Could you drop only #9 of this series from clowncopter ?
> 
> What's the status on this? I would be happy to see issue 4378 fixed and 
> we seemed to be only a few centimeters away.

I just finished all, and am sending 2 series below one after another !

  1. prepare for making dirstate aware of PENDING mechanism (6 patches)

  2. make in-memory dirstate changes visible to external process
     2.1 commit editor (for issue4378)
     2.2 precommit hooks
     2.3 pretxncommit hooks
     2.4 preupdate and update hooks

BTW, I will also send another series for STABLE to fix TXN*/pending
hook argument problems found while working for issue4378 :-) The last
patch of this series has a relation with (2.2) and (2.4) above. Please
pay attention to this series, too.


> -- 
> Pierre-Yves David
> 

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp

Patch

diff --git a/mercurial/cmdutil.py b/mercurial/cmdutil.py
--- a/mercurial/cmdutil.py
+++ b/mercurial/cmdutil.py
@@ -3259,3 +3259,59 @@ 
     for f, clearable, allowcommit, msg, hint in unfinishedstates:
         if clearable and repo.vfs.exists(f):
             util.unlink(repo.join(f))
+
+class dirstateguard(object):
+    '''Restore dirstate at unexpected failure.
+
+    At the construction, this class does:
+
+    - write current ``repo.dirstate`` out, and
+    - save ``.hg/dirstate`` into the backup file
+
+    This restores ``.hg/dirstate`` from backup file, if ``release()``
+    is invoked before ``close()``.
+
+    This just removes the backup file at ``close()`` before ``release()``.
+    '''
+
+    def __init__(self, repo, name):
+        repo.dirstate.write()
+        self.repo = repo
+        self.name = 'dirstate.backup.%s' % name
+        repo.vfs.write(self.name, repo.vfs.tryread('dirstate'))
+        self.active = True
+        self.closed = False
+
+    def __del__(self):
+        if self.active: # still active
+            # this may occur, even if this class is used correctly:
+            # for example, releasing other resources like transaction
+            # may raise exception before ``dirstateguard.release`` in
+            # ``release(tr, ....)``.
+            self._abort()
+
+    def close(self):
+        if not self.active: # already inactivated
+            msg = (_("can't close already inactivated backup: %s")
+                   % self.name)
+            raise util.Abort(msg)
+
+        self.repo.vfs.unlink(self.name)
+        self.active = False
+        self.closed = True
+
+    def _abort(self):
+        # this "invalidate()" prevents "wlock.release()" from writing
+        # changes of dirstate out after restoring to original status
+        self.repo.dirstate.invalidate()
+
+        self.repo.vfs.rename(self.name, 'dirstate')
+        self.active = False
+
+    def release(self):
+        if not self.closed:
+            if not self.active: # already inactivated
+                msg = (_("can't release already inactivated backup: %s")
+                       % self.name)
+                raise util.Abort(msg)
+            self._abort()