Patchwork [3,of,6] dirstate: read from `.pending` file under HG_PENDING mode

login
register
mail settings
Submitter Katsunori FUJIWARA
Date May 19, 2015, 4:42 p.m.
Message ID <e830ef506b9aaa01c540.1432053723@feefifofum>
Download mbox | patch
Permalink /patch/9180/
State Changes Requested
Delegated to: Pierre-Yves David
Headers show

Comments

Katsunori FUJIWARA - May 19, 2015, 4:42 p.m.
# HG changeset patch
# User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
# Date 1432051569 -32400
#      Wed May 20 01:06:09 2015 +0900
# Node ID e830ef506b9aaa01c5406d100a385961c4c0dff0
# Parent  2d2eff9ee053025e8e0267b8d6c6d07326f607ee
dirstate: read from `.pending` file under HG_PENDING mode

True/False value of `_pendingmode` means whether `dirstate.pending` is
used to initialize own `_map` and so on. When it is None, neither
`dirstate` nor `dirstate.pending` is read in yet.

This is used to keep consistent view between `_pl()` and `_read()`.

Once `_pendingmode` is determined by reading one of `dirstate` or
`dirstate.pending` in, `_pendingmode` is kept even if `invalidate()`
is invoked. This should be reasonable, because:

- effective `invalidate()` invocation should occur only in wlock scope, and
- wlock can't be gotten under HG_PENDING mode

For review-ability, this patch focuses only on reading `.pending` file
in. Then, `dirstate.write()` can write changes out even under
HG_PENDING mode. But it is still safe enough, because there is no code
path writing `.pending` file out yet (= `_pendingmode` never be True).

`_trypending()` is defined as a normal function to factor similar code
path (in bookmarks and phases) out in the future easily.
Pierre-Yves David - May 21, 2015, 9:59 p.m.
On 05/19/2015 11:42 AM, FUJIWARA Katsunori wrote:
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> # Date 1432051569 -32400
> #      Wed May 20 01:06:09 2015 +0900
> # Node ID e830ef506b9aaa01c5406d100a385961c4c0dff0
> # Parent  2d2eff9ee053025e8e0267b8d6c6d07326f607ee
> dirstate: read from `.pending` file under HG_PENDING mode
>
> True/False value of `_pendingmode` means whether `dirstate.pending` is
> used to initialize own `_map` and so on. When it is None, neither
> `dirstate` nor `dirstate.pending` is read in yet.
>
> This is used to keep consistent view between `_pl()` and `_read()`.
>
> Once `_pendingmode` is determined by reading one of `dirstate` or
> `dirstate.pending` in, `_pendingmode` is kept even if `invalidate()`
> is invoked. This should be reasonable, because:
>
> - effective `invalidate()` invocation should occur only in wlock scope, and
> - wlock can't be gotten under HG_PENDING mode
>
> For review-ability, this patch focuses only on reading `.pending` file
> in. Then, `dirstate.write()` can write changes out even under
> HG_PENDING mode. But it is still safe enough, because there is no code
> path writing `.pending` file out yet (= `_pendingmode` never be True).
>
> `_trypending()` is defined as a normal function to factor similar code
> path (in bookmarks and phases) out in the future easily.

This series is poking in the right direction, but it is failing to 
acknowledge the following list of principle and will not result in a 
working behavior.

1) '.pending' files are made necessary by the existence of a transaction,
2) HG_PENDING (and .pending files usage) must -only- occured in the 
context of a transaction.
3) The transaction object is the one handle the '.pending' files and 
associated non-pending file. This includes:

3.1) The transaction must purge any existing '.pending' when it open/close
3.2) generation of .pending files is done through the transaction
3.3) writing of the final files is also done through the transaction

*) We now have a developer warning feature, we should use it more.

Let me details this more.


1) Transaction make '.pending' necessary.
-----------------------------------------

The goal of transaction is to prevent any external viewer to see its 
content before it is committed. So anything happening during the 
transaction is actually written to disk only at `tr.close()` time.

To lets hooks and tools see the content of the transaction, the 
'pending' mechanism is used. An environment variable let sub-invocation 
of hg know that it needs to looks for extra data in different file.



2) pending happen during transaction only
-----------------------------------------

In the general case, we do not use the same 'pending' mechanism for 
dirstate related changes for the following reason:

- dirstate is related to the working copy. Most external viewer like 
clone/pull/push do not care about the working copy so we are not at risk 
of uncommited data escaping the repository.

- dirstate is related to the content of working copy (parents, but also 
file content). We cannot have an atomic update of all the file in the 
working copy so we'll have transient invalid state during the update 
anyway, therefor we are not even trying to ensure we have an atomic 
transaction for external viewer.

However, there is a case where we need dirstate to use a '.pending' 
mechanism: When the dirstate change makes it refer to content contained 
in a not-yet committed transaction.

- we do not want to make such dirstate visible until the transaction 
content is visible.

- sub-invocation of mercurial in hook/tool need to be able to see this 
content anyway.

To simplify things. I'm assuming:

   "dirstate refer to content in the transaction."

is the same as:

   "dirstate change happen during a transaction."


So, if your dirstate change happens inside a transaction, it should be 
collaborating with it. Your series have to make use of 
the`repo.currenttransaction()` method at least once somewhere.

3) The transaction object is the one handling file
--------------------------------------------------

To simplify entry points and ensure consistency, the transaction is 
handling all files involved in the transaction from end to end. So when 
dirstate collaborate with the transaction, it has to let the transaction 
handle the dirstate file


related to point 3.2: We are generating `pending` file on demand, 
because if no subprocess is to be call, we do not need to generated 
them. Every subcall that can need pending file will use call 
`tr.pending()` to trigger this generation. This is a single entry point 
that the code know and use. The function will also handle the case where 
there is no pending changes and not special logic is needed.

To get the transaction able to include your pending file, you have to 
register a "file generator" with `tr.addfilegenerator` the documentation 
is fairly extensive so I won't repeat it here. (if document is unclear 
we have to patch it).

I expect a call like:

   tr.addfilegenerator('dirstate', ('dirstate', self._write,
                       location='plain')

Where self._write is a method accepting an (already open) file object 
and writing dirstate content in it.

addfilegenerator can be called multiple time, each call replacing the other.

related to point 3.3: You have to let the transaction handle the final 
write too. The reason you have dirstate being handled by the transaction 
logic (if present) is that you never want to write the dirstate to disk 
(except pending) until the transaction data are visible too. And the 
dirstate/guard object have no way to know when this will happen. The 
transaction object is the one who know about that.

Lets looks at the following case:

   with tr:
       commit = newcommit()
       with ds:
            ds.setparent()
            ds.close() # confirm the dirstate change
       othercrap()
       tr.close() # commit the transaction

In this case the "ds.close()" is not the right time to write the file 
down, it is too early. It has to be written down at 'tr.close()' time.
By chance, the transaction will automatically take care of that since 
you used `addfilegenerator`.

summary: your .close() must not touch disk in the in-transaction case. 
Will handle it.

The final writing of the dirstate file will not be a simple move of the 
'.pending' file. This is necessary because the pending are lazily 
generated. Lets looks at the following scenario:

   with tr:
       commit = newcommit()
       with ds:
            ds.setparent()
            ds.close() # confirm the move
       tr.pending()
       with ds:
            ds.setparent()
            ds.close() # confirm the move

       tr.close() # commit the transaction

In this case, the .pending file is outdated compared to the in-memory 
state. So we need to regenerate the file anyway.

summary: transaction won't use simple rename.

related to point 3.1: You also have to be very careful of not leaving 
any '.pending' file around. Lets look at the following scenario.

  $ hg commit # write dirstate.pending, leave it behind
  $ hg up 42 # does not need to write a pending file
  $ hg pull # transaction call hook on pending change, dirstate is read
            # from .pending becaues it exists

So we have to make sure we do not leave the file behind. By chance, the 
transaction will take care of that for you. It keep track of all 
temporary file it generated and make sure they are cleaned up when the 
transaction is closed or rollbacked.

(We should probably aggressively nuke all known '.pending; type file at 
transaction open time but I'm not sure it is done yet.)

other nice thing: the transaction logic will also backup your file (and 
restore in rollback case) your dirstate file as soon as your register a 
generator for it.

Developer Warning:
-----------------

To work your code (will) make multiple assumption:

- wlock is alway taken when we touch the dirstate,
- write are always done in a dirstate guard context,
- we do not do any direct writing if a transaction exists,

It is easy to check this is true, but we crashing in these case is 
suboptimal since it will affect human user of unsafe, but already 
existing code. By chance we now have a "developer warning concept" (that 
you get from `devel.all=true` config) and the default behavior for tests 
is now to run them. So adding code to detect and issue warning in this 
case will let the code base be cleaned up over time.
Katsunori FUJIWARA - May 22, 2015, 4:34 p.m.
At Thu, 21 May 2015 16:59:47 -0500,
Pierre-Yves David wrote:
> 
> On 05/19/2015 11:42 AM, FUJIWARA Katsunori wrote:
> > # HG changeset patch
> > # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> > # Date 1432051569 -32400
> > #      Wed May 20 01:06:09 2015 +0900
> > # Node ID e830ef506b9aaa01c5406d100a385961c4c0dff0
> > # Parent  2d2eff9ee053025e8e0267b8d6c6d07326f607ee
> > dirstate: read from `.pending` file under HG_PENDING mode
> >
> > True/False value of `_pendingmode` means whether `dirstate.pending` is
> > used to initialize own `_map` and so on. When it is None, neither
> > `dirstate` nor `dirstate.pending` is read in yet.
> >
> > This is used to keep consistent view between `_pl()` and `_read()`.
> >
> > Once `_pendingmode` is determined by reading one of `dirstate` or
> > `dirstate.pending` in, `_pendingmode` is kept even if `invalidate()`
> > is invoked. This should be reasonable, because:
> >
> > - effective `invalidate()` invocation should occur only in wlock scope, and
> > - wlock can't be gotten under HG_PENDING mode
> >
> > For review-ability, this patch focuses only on reading `.pending` file
> > in. Then, `dirstate.write()` can write changes out even under
> > HG_PENDING mode. But it is still safe enough, because there is no code
> > path writing `.pending` file out yet (= `_pendingmode` never be True).
> >
> > `_trypending()` is defined as a normal function to factor similar code
> > path (in bookmarks and phases) out in the future easily.
> 
> This series is poking in the right direction, but it is failing to 
> acknowledge the following list of principle and will not result in a 
> working behavior.
> 
> 1) '.pending' files are made necessary by the existence of a transaction,
> 2) HG_PENDING (and .pending files usage) must -only- occured in the 
> context of a transaction.
> 3) The transaction object is the one handle the '.pending' files and 
> associated non-pending file. This includes:
> 
> 3.1) The transaction must purge any existing '.pending' when it open/close
> 3.2) generation of .pending files is done through the transaction
> 3.3) writing of the final files is also done through the transaction
> 
> *) We now have a developer warning feature, we should use it more.
> 
> Let me details this more.

Thank you for detailed explanation !

And, perhaps, should I post the patch below not as the first one of
next series but as the last one of this series ?

I put this into next series, because this seemed to increase
review-ability of client code paths (= making in-memory changes
visible to external process) of these functions in next series.

    ====================
    diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
    --- a/mercurial/dirstate.py
    +++ b/mercurial/dirstate.py
    @@ -632,6 +632,35 @@ class dirstate(object):
                 return
             self._writedirstate(self._filename)
    
    +    def delayupdate(self, tr):
    +        '''Delay visibility of dirstate to other readers
    +        '''
    +        category = 'dirstate-%i' % id(self)
    +
    +        tr.addpending(category, self._writepending)
    +
    +        # pending file should be fixed up regardless of outcome of transaction
    +        tr.addfinalize(category, self._fixuppending)
    +        tr.addabort(category, self._fixuppending)
    +
    +    def makependingvisible(self, repo):
    +        '''Make pending disrtate changes visible to external process
    +
    +        This is an utility to treat pending changes regardless of
    +        transaction activity.
    +
    +        This returns the value to be set to 'HG_PENDING' environment
    +        variable for external process.
    +        '''
    +        tr = repo.currenttransaction()
    +        if tr:
    +            self.delayupdate(tr)
    +            if tr.writepending():
    +                return repo.root # set 'HG_PENDING' only in this case
    +        else:
    +            self.write()
    +        return ""
    +
         def _removepending(self):
             try:
                 self._opener.unlink(self._pendingfilename)
    ====================

This assumes that:

  - `dirstate.delayupdate()` will be used in certain transaction scope
    (e.g. for `pretxncommit` hook), like `changelog.delayupdate()`

  - `dirstate.makependingvisible()` will be used in other code paths
    (e.g. for commit editor, `precommit` hook and so on), where
    transaction activity is determined only at runtime.

I think that using these functions can satisfy rules you described
above. How about this ?


I chose `addpending()` instead of `addfilegenerator()` to write
dirstate pending file out, because assumption below of the latter
seems not suitable for dirstate characteristic:

  - changes will be written out at successful `transaction.close()`
  - pending file will be discarded at transaction failure

As described in patch #4 of this series, dirstate changes should be
written out (at releasing wlock) regardless of outcome of transaction
in almost all cases.

Then, I chose registering `dirstate._fixuppending` via `addfinalize()`
and `addabort()` in `dirstate.delayupdate()` to certainly cleanup
(remove or rename) pending file at the end of transaction.

This can also delay writing dirstate changes out until releasing
wlock, even though it may be a little violation of (3.3).

For example, rebasing revisions causes one transaction for each
revisions to be rebased in a wlock scope. In this case, it is enough
to write dirstate changes out only at releasing wlock, if no external
(hook) process is invoked while rebasing.


`dirstate.write()` and `dirstate.invalidate()` may be invoked even in
transaction scope (e.g. via `dirstateguard` in import, amend and so
on).

This is reason why `_removepending()` and `_fixuppending()` are
invoked in them.


BTW:

> Developer Warning:
> -----------------
> 
> To work your code (will) make multiple assumption:
> 
[snip]
> - write are always done in a dirstate guard context,

Even though `dirstateguard` is typical `dirstate.write()` caller, I
don't assume above. But are there something to indicate or suggest so ?
I may overllook them :-<



> 1) Transaction make '.pending' necessary.
> -----------------------------------------
> 
> The goal of transaction is to prevent any external viewer to see its 
> content before it is committed. So anything happening during the 
> transaction is actually written to disk only at `tr.close()` time.
> 
> To lets hooks and tools see the content of the transaction, the 
> 'pending' mechanism is used. An environment variable let sub-invocation 
> of hg know that it needs to looks for extra data in different file.
> 
> 
> 
> 2) pending happen during transaction only
> -----------------------------------------
> 
> In the general case, we do not use the same 'pending' mechanism for 
> dirstate related changes for the following reason:
> 
> - dirstate is related to the working copy. Most external viewer like 
> clone/pull/push do not care about the working copy so we are not at risk 
> of uncommited data escaping the repository.
> 
> - dirstate is related to the content of working copy (parents, but also 
> file content). We cannot have an atomic update of all the file in the 
> working copy so we'll have transient invalid state during the update 
> anyway, therefor we are not even trying to ensure we have an atomic 
> transaction for external viewer.
> 
> However, there is a case where we need dirstate to use a '.pending' 
> mechanism: When the dirstate change makes it refer to content contained 
> in a not-yet committed transaction.
> 
> - we do not want to make such dirstate visible until the transaction 
> content is visible.
> 
> - sub-invocation of mercurial in hook/tool need to be able to see this 
> content anyway.
> 
> To simplify things. I'm assuming:
> 
>    "dirstate refer to content in the transaction."
> 
> is the same as:
> 
>    "dirstate change happen during a transaction."
> 
> 
> So, if your dirstate change happens inside a transaction, it should be 
> collaborating with it. Your series have to make use of 
> the`repo.currenttransaction()` method at least once somewhere.
> 
> 3) The transaction object is the one handling file
> --------------------------------------------------
> 
> To simplify entry points and ensure consistency, the transaction is 
> handling all files involved in the transaction from end to end. So when 
> dirstate collaborate with the transaction, it has to let the transaction 
> handle the dirstate file
> 
> 
> related to point 3.2: We are generating `pending` file on demand, 
> because if no subprocess is to be call, we do not need to generated 
> them. Every subcall that can need pending file will use call 
> `tr.pending()` to trigger this generation. This is a single entry point 
> that the code know and use. The function will also handle the case where 
> there is no pending changes and not special logic is needed.
> 
> To get the transaction able to include your pending file, you have to 
> register a "file generator" with `tr.addfilegenerator` the documentation 
> is fairly extensive so I won't repeat it here. (if document is unclear 
> we have to patch it).
> 
> I expect a call like:
> 
>    tr.addfilegenerator('dirstate', ('dirstate', self._write,
>                        location='plain')
> 
> Where self._write is a method accepting an (already open) file object 
> and writing dirstate content in it.
> 
> addfilegenerator can be called multiple time, each call replacing the other.
> 
> related to point 3.3: You have to let the transaction handle the final 
> write too. The reason you have dirstate being handled by the transaction 
> logic (if present) is that you never want to write the dirstate to disk 
> (except pending) until the transaction data are visible too. And the 
> dirstate/guard object have no way to know when this will happen. The 
> transaction object is the one who know about that.
> 
> Lets looks at the following case:
> 
>    with tr:
>        commit = newcommit()
>        with ds:
>             ds.setparent()
>             ds.close() # confirm the dirstate change
>        othercrap()
>        tr.close() # commit the transaction
> 
> In this case the "ds.close()" is not the right time to write the file 
> down, it is too early. It has to be written down at 'tr.close()' time.
> By chance, the transaction will automatically take care of that since 
> you used `addfilegenerator`.
> 
> summary: your .close() must not touch disk in the in-transaction case. 
> Will handle it.
> 
> The final writing of the dirstate file will not be a simple move of the 
> '.pending' file. This is necessary because the pending are lazily 
> generated. Lets looks at the following scenario:
> 
>    with tr:
>        commit = newcommit()
>        with ds:
>             ds.setparent()
>             ds.close() # confirm the move
>        tr.pending()
>        with ds:
>             ds.setparent()
>             ds.close() # confirm the move
> 
>        tr.close() # commit the transaction
> 
> In this case, the .pending file is outdated compared to the in-memory 
> state. So we need to regenerate the file anyway.
> 
> summary: transaction won't use simple rename.
> 
> related to point 3.1: You also have to be very careful of not leaving 
> any '.pending' file around. Lets look at the following scenario.
> 
>   $ hg commit # write dirstate.pending, leave it behind
>   $ hg up 42 # does not need to write a pending file
>   $ hg pull # transaction call hook on pending change, dirstate is read
>             # from .pending becaues it exists
> 
> So we have to make sure we do not leave the file behind. By chance, the 
> transaction will take care of that for you. It keep track of all 
> temporary file it generated and make sure they are cleaned up when the 
> transaction is closed or rollbacked.
> 
> (We should probably aggressively nuke all known '.pending; type file at 
> transaction open time but I'm not sure it is done yet.)
> 
> other nice thing: the transaction logic will also backup your file (and 
> restore in rollback case) your dirstate file as soon as your register a 
> generator for it.
> 
> Developer Warning:
> -----------------
> 
> To work your code (will) make multiple assumption:
> 
> - wlock is alway taken when we touch the dirstate,
> - write are always done in a dirstate guard context,
> - we do not do any direct writing if a transaction exists,
> 
> It is easy to check this is true, but we crashing in these case is 
> suboptimal since it will affect human user of unsafe, but already 
> existing code. By chance we now have a "developer warning concept" (that 
> you get from `devel.all=true` config) and the default behavior for tests 
> is now to run them. So adding code to detect and issue warning in this 
> case will let the code base be cleaned up over time.
> 
> -- 
> Pierre-Yves David
> 


----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Pierre-Yves David - May 22, 2015, 4:57 p.m.
On 05/22/2015 11:34 AM, FUJIWARA Katsunori wrote:
>
> At Thu, 21 May 2015 16:59:47 -0500,
> Pierre-Yves David wrote:
>>
>> On 05/19/2015 11:42 AM, FUJIWARA Katsunori wrote:
>>> # HG changeset patch
>>> # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
>>> # Date 1432051569 -32400
>>> #      Wed May 20 01:06:09 2015 +0900
>>> # Node ID e830ef506b9aaa01c5406d100a385961c4c0dff0
>>> # Parent  2d2eff9ee053025e8e0267b8d6c6d07326f607ee
>>> dirstate: read from `.pending` file under HG_PENDING mode
>>>
>>> True/False value of `_pendingmode` means whether `dirstate.pending` is
>>> used to initialize own `_map` and so on. When it is None, neither
>>> `dirstate` nor `dirstate.pending` is read in yet.
>>>
>>> This is used to keep consistent view between `_pl()` and `_read()`.
>>>
>>> Once `_pendingmode` is determined by reading one of `dirstate` or
>>> `dirstate.pending` in, `_pendingmode` is kept even if `invalidate()`
>>> is invoked. This should be reasonable, because:
>>>
>>> - effective `invalidate()` invocation should occur only in wlock scope, and
>>> - wlock can't be gotten under HG_PENDING mode
>>>
>>> For review-ability, this patch focuses only on reading `.pending` file
>>> in. Then, `dirstate.write()` can write changes out even under
>>> HG_PENDING mode. But it is still safe enough, because there is no code
>>> path writing `.pending` file out yet (= `_pendingmode` never be True).
>>>
>>> `_trypending()` is defined as a normal function to factor similar code
>>> path (in bookmarks and phases) out in the future easily.
>>
>> This series is poking in the right direction, but it is failing to
>> acknowledge the following list of principle and will not result in a
>> working behavior.
>>
>> 1) '.pending' files are made necessary by the existence of a transaction,
>> 2) HG_PENDING (and .pending files usage) must -only- occured in the
>> context of a transaction.
>> 3) The transaction object is the one handle the '.pending' files and
>> associated non-pending file. This includes:
>>
>> 3.1) The transaction must purge any existing '.pending' when it open/close
>> 3.2) generation of .pending files is done through the transaction
>> 3.3) writing of the final files is also done through the transaction
>>
>> *) We now have a developer warning feature, we should use it more.
>>
>> Let me details this more.
>
> Thank you for detailed explanation !
>
> And, perhaps, should I post the patch below not as the first one of
> next series but as the last one of this series ?
>
> I put this into next series, because this seemed to increase
> review-ability of client code paths (= making in-memory changes
> visible to external process) of these functions in next series.
>
>      ====================
>      diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
>      --- a/mercurial/dirstate.py
>      +++ b/mercurial/dirstate.py
>      @@ -632,6 +632,35 @@ class dirstate(object):
>                   return
>               self._writedirstate(self._filename)
>
>      +    def delayupdate(self, tr):
>      +        '''Delay visibility of dirstate to other readers
>      +        '''
>      +        category = 'dirstate-%i' % id(self)
>      +
>      +        tr.addpending(category, self._writepending)
>      +
>      +        # pending file should be fixed up regardless of outcome of transaction
>      +        tr.addfinalize(category, self._fixuppending)
>      +        tr.addabort(category, self._fixuppending)
>      +
>      +    def makependingvisible(self, repo):
>      +        '''Make pending disrtate changes visible to external process
>      +
>      +        This is an utility to treat pending changes regardless of
>      +        transaction activity.
>      +
>      +        This returns the value to be set to 'HG_PENDING' environment
>      +        variable for external process.
>      +        '''
>      +        tr = repo.currenttransaction()
>      +        if tr:
>      +            self.delayupdate(tr)
>      +            if tr.writepending():
>      +                return repo.root # set 'HG_PENDING' only in this case
>      +        else:
>      +            self.write()
>      +        return ""
>      +
>           def _removepending(self):
>               try:
>                   self._opener.unlink(self._pendingfilename)
>      ====================
>
> This assumes that:
>
>    - `dirstate.delayupdate()` will be used in certain transaction scope
>      (e.g. for `pretxncommit` hook), like `changelog.delayupdate()`
>
>    - `dirstate.makependingvisible()` will be used in other code paths
>      (e.g. for commit editor, `precommit` hook and so on), where
>      transaction activity is determined only at runtime.
>
> I think that using these functions can satisfy rules you described
> above. How about this ?
>
>
> I chose `addpending()` instead of `addfilegenerator()` to write
> dirstate pending file out, because assumption below of the latter
> seems not suitable for dirstate characteristic:
>
>    - changes will be written out at successful `transaction.close()`
>    - pending file will be discarded at transaction failure
>
> As described in patch #4 of this series, dirstate changes should be
> written out (at releasing wlock) regardless of outcome of transaction
> in almost all cases.

This assumption are a surprise to me. Can you explain why we do not want 
change to the dirstate that at "done withing a transaction" to be 
rollbacked during that transaction.

Practical example will probably help to transfer your knowledge in my 
direction :)

>> Developer Warning:
>> -----------------
>>
>> To work your code (will) make multiple assumption:
>>
> [snip]
>> - write are always done in a dirstate guard context,
>
> Even though `dirstateguard` is typical `dirstate.write()` caller, I
> don't assume above. But are there something to indicate or suggest so ?
> I may overllook them :-<

I'm far less familiar with this topic than you. But I cannot think of a 
case were we would want to update the dirstate content without some sort 
of transactional protect that guard is offering.

Do you have an example ?
Katsunori FUJIWARA - May 23, 2015, 8:08 p.m.
At Fri, 22 May 2015 11:57:50 -0500,
Pierre-Yves David wrote:
> 
> On 05/22/2015 11:34 AM, FUJIWARA Katsunori wrote:
> >
> > At Thu, 21 May 2015 16:59:47 -0500,
> > Pierre-Yves David wrote:
> >>
> >> On 05/19/2015 11:42 AM, FUJIWARA Katsunori wrote:
> >>> # HG changeset patch
> >>> # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> >>> # Date 1432051569 -32400
> >>> #      Wed May 20 01:06:09 2015 +0900
> >>> # Node ID e830ef506b9aaa01c5406d100a385961c4c0dff0
> >>> # Parent  2d2eff9ee053025e8e0267b8d6c6d07326f607ee
> >>> dirstate: read from `.pending` file under HG_PENDING mode
> >>>
> >>> True/False value of `_pendingmode` means whether `dirstate.pending` is
> >>> used to initialize own `_map` and so on. When it is None, neither
> >>> `dirstate` nor `dirstate.pending` is read in yet.
> >>>
> >>> This is used to keep consistent view between `_pl()` and `_read()`.
> >>>
> >>> Once `_pendingmode` is determined by reading one of `dirstate` or
> >>> `dirstate.pending` in, `_pendingmode` is kept even if `invalidate()`
> >>> is invoked. This should be reasonable, because:
> >>>
> >>> - effective `invalidate()` invocation should occur only in wlock scope, and
> >>> - wlock can't be gotten under HG_PENDING mode
> >>>
> >>> For review-ability, this patch focuses only on reading `.pending` file
> >>> in. Then, `dirstate.write()` can write changes out even under
> >>> HG_PENDING mode. But it is still safe enough, because there is no code
> >>> path writing `.pending` file out yet (= `_pendingmode` never be True).
> >>>
> >>> `_trypending()` is defined as a normal function to factor similar code
> >>> path (in bookmarks and phases) out in the future easily.
> >>
> >> This series is poking in the right direction, but it is failing to
> >> acknowledge the following list of principle and will not result in a
> >> working behavior.
> >>
> >> 1) '.pending' files are made necessary by the existence of a transaction,
> >> 2) HG_PENDING (and .pending files usage) must -only- occured in the
> >> context of a transaction.
> >> 3) The transaction object is the one handle the '.pending' files and
> >> associated non-pending file. This includes:
> >>
> >> 3.1) The transaction must purge any existing '.pending' when it open/close
> >> 3.2) generation of .pending files is done through the transaction
> >> 3.3) writing of the final files is also done through the transaction
> >>
> >> *) We now have a developer warning feature, we should use it more.
> >>
> >> Let me details this more.
> >
> > Thank you for detailed explanation !
> >
> > And, perhaps, should I post the patch below not as the first one of
> > next series but as the last one of this series ?
> >
> > I put this into next series, because this seemed to increase
> > review-ability of client code paths (= making in-memory changes
> > visible to external process) of these functions in next series.
> >
> >      ====================
> >      diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
> >      --- a/mercurial/dirstate.py
> >      +++ b/mercurial/dirstate.py
> >      @@ -632,6 +632,35 @@ class dirstate(object):
> >                   return
> >               self._writedirstate(self._filename)
> >
> >      +    def delayupdate(self, tr):
> >      +        '''Delay visibility of dirstate to other readers
> >      +        '''
> >      +        category = 'dirstate-%i' % id(self)
> >      +
> >      +        tr.addpending(category, self._writepending)
> >      +
> >      +        # pending file should be fixed up regardless of outcome of transaction
> >      +        tr.addfinalize(category, self._fixuppending)
> >      +        tr.addabort(category, self._fixuppending)
> >      +
> >      +    def makependingvisible(self, repo):
> >      +        '''Make pending disrtate changes visible to external process
> >      +
> >      +        This is an utility to treat pending changes regardless of
> >      +        transaction activity.
> >      +
> >      +        This returns the value to be set to 'HG_PENDING' environment
> >      +        variable for external process.
> >      +        '''
> >      +        tr = repo.currenttransaction()
> >      +        if tr:
> >      +            self.delayupdate(tr)
> >      +            if tr.writepending():
> >      +                return repo.root # set 'HG_PENDING' only in this case
> >      +        else:
> >      +            self.write()
> >      +        return ""
> >      +
> >           def _removepending(self):
> >               try:
> >                   self._opener.unlink(self._pendingfilename)
> >      ====================
> >
> > This assumes that:
> >
> >    - `dirstate.delayupdate()` will be used in certain transaction scope
> >      (e.g. for `pretxncommit` hook), like `changelog.delayupdate()`
> >
> >    - `dirstate.makependingvisible()` will be used in other code paths
> >      (e.g. for commit editor, `precommit` hook and so on), where
> >      transaction activity is determined only at runtime.
> >
> > I think that using these functions can satisfy rules you described
> > above. How about this ?
> >
> >
> > I chose `addpending()` instead of `addfilegenerator()` to write
> > dirstate pending file out, because assumption below of the latter
> > seems not suitable for dirstate characteristic:
> >
> >    - changes will be written out at successful `transaction.close()`
> >    - pending file will be discarded at transaction failure
> >
> > As described in patch #4 of this series, dirstate changes should be
> > written out (at releasing wlock) regardless of outcome of transaction
> > in almost all cases.
> 
> This assumption are a surprise to me. Can you explain why we do not want 
> change to the dirstate that at "done withing a transaction" to be 
> rollbacked during that transaction.
> 
> Practical example will probably help to transfer your knowledge in my 
> direction :)

I categorized cases below:

 1. changing dirstate inside transaction:

    - qpush
    - amend
    - import
      now, dirstate changes inside transaction are explicitly
      discarded at failure by `dirstateguard`

      => OK, we can use `addfilegenerator()` in this case !

    - transplant
    - unshelve
      dirstate is always written out at releasing wlock, and:

      - for InterventionRequired or TransplantError, transaction is
        manually `close`-ed
      - for other exceptions, transaction is rollback-ed

      => Can we omit writing dirstate changes out at failure (= the
         latter above) in this case ?

         IMHO, probably "yes", because current dirstate parents
         already become "unknown" by rollback-ing changelog.

 2. changing dirstate inside wlock, but outside (= before) transaction:

    => We should keep changes regardless of outcome of transaction in
       this case for subsequent operations, like `--continue` or so

       (e.g. `repo.commit()` may fail for pretxncommit hook detecting
       violation of team specific rules)

       This means that changes held in `.pending` are still needed
       even if transaction is rollback-ed.

    - backout
    - graft
    - histedit
    - rebase (*1)

      (*1) Strictly speaking, `dirstateguard` saves changes before
           `repo.commit()`, so we can use `addfilegenerator()` for
           rebase. But let's forget this to make the problem simple :-)

IMHO, we should mainly think about (2) case above.

It is fact that invoking `dirstate.write()` explicitly before
`repo.commit()` or so makes (2) case suitable for use of
`addfilegenerator()`. In other words, "let's use clean dirstate inside
transaction !".

But on the other hand, it is a little inefficient, because dirstate
changes are written out at each continuous `repo.commit()` invocations
(while graft, histedit and rebase), even if external process is never
invoked.


> >> Developer Warning:
> >> -----------------
> >>
> >> To work your code (will) make multiple assumption:
> >>
> > [snip]
> >> - write are always done in a dirstate guard context,
> >
> > Even though `dirstateguard` is typical `dirstate.write()` caller, I
> > don't assume above. But are there something to indicate or suggest so ?
> > I may overllook them :-<
> 
> I'm far less familiar with this topic than you. But I cannot think of a 
> case were we would want to update the dirstate content without some sort 
> of transactional protect that guard is offering.
> 
> Do you have an example ?

AFAIK, there are still some explicit `dirstate.write()` invocations
outside transaction/dirstateguard scope:

  - in `rebase.rebasenode()`
  - in `strip.strip()` and `strip.stripcmd()`

    These seem to try making in-memory changes visible to external
    process spawned at subsequent `merge.update()` (or, save
    intermediate changes for safety ?).

    IMHO, these can be removed after making in-memory dirstate changes
    visible :-)

  - in `merge.graft()` (used by graft and histedit)

    This wants to ensure dropping the second merge parent, even if
    subsequent process is aborted.

    This is needed, if `copies.duplicatecopies()` may fail in some
    cases (or we may have to replace it by `dirstateguard` or so, to
    deprecate explicit `dirstate.write` invocation)


> -- 
> Pierre-Yves David
> 


----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp

Patch

diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
--- a/mercurial/dirstate.py
+++ b/mercurial/dirstate.py
@@ -26,6 +26,22 @@ 
     def join(self, obj, fname):
         return obj._join(fname)
 
+def _trypending(root, vfs, filename):
+    '''Open  file to be read according to HG_PENDING environment variable
+
+    This opens `.pending` of specified `filename` only when HG_PENDING
+    is equal to `root`.
+
+    This returns `(fp, is_pending_opened)` tuple.
+    '''
+    if root == os.environ.get('HG_PENDING'):
+        try:
+            return (vfs('%s.pending' % filename), True)
+        except IOError, inst:
+            if inst.errno != errno.ENOENT:
+                raise
+    return (vfs(filename), False)
+
 class dirstate(object):
 
     def __init__(self, opener, ui, root, validate):
@@ -49,6 +65,9 @@ 
         self._parentwriters = 0
         self._filename = 'dirstate'
 
+        # for consitent view between _pl() and _read() invocations
+        self._pendingmode = None
+
     def beginparentchange(self):
         '''Marks the beginning of a set of changes that involve changing
         the dirstate parents. If there is an exception during this time,
@@ -122,7 +141,7 @@ 
     @propertycache
     def _pl(self):
         try:
-            fp = self._opener(self._filename)
+            fp = self._opendirstatefile()
             st = fp.read(40)
             fp.close()
             l = len(st)
@@ -316,11 +335,20 @@ 
             f.discard()
             raise
 
+    def _opendirstatefile(self):
+        fp, mode = _trypending(self._root, self._opener, self._filename)
+        if self._pendingmode is not None and self._pendingmode != mode:
+            fp.close()
+            raise util.Abort(_('working directory state may be '
+                               'changed parallelly'))
+        self._pendingmode = mode
+        return fp
+
     def _read(self):
         self._map = {}
         self._copymap = {}
         try:
-            fp = self._opener.open(self._filename)
+            fp = self._opendirstatefile()
             try:
                 st = fp.read()
             finally: