Patchwork [1,of,2,STABLE] hooks: always include HG_PENDING

login
register
mail settings
Submitter Durham Goode
Date Nov. 4, 2015, 1:19 a.m.
Message ID <0a772553285d61aeea32.1446599977@dev2000.prn2.facebook.com>
Download mbox | patch
Permalink /patch/11284/
State Accepted
Headers show

Comments

Durham Goode - Nov. 4, 2015, 1:19 a.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1446598693 28800
#      Tue Nov 03 16:58:13 2015 -0800
# Branch stable
# Node ID 0a772553285d61aeea3226f74698223b09fae7ac
# Parent  58b7f3e93bbab749ab16c09df12aae5ba7880708
hooks: always include HG_PENDING

Previously we would only include HG_PENDING in the hook args if the
transaction's writepending() actually wrote something. This is a bad criteria,
since it's possible that a previous call to writepending() wrote stuff and the
hooks want to still see that.

The solution is to always have hooks execute within the scope of the pending
changes by always putting HG_PENDING in the environment.
Katsunori FUJIWARA - Nov. 5, 2015, 12:26 a.m.
At Tue, 3 Nov 2015 17:19:37 -0800,
Durham Goode wrote:
> 
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1446598693 28800
> #      Tue Nov 03 16:58:13 2015 -0800
> # Branch stable
> # Node ID 0a772553285d61aeea3226f74698223b09fae7ac
> # Parent  58b7f3e93bbab749ab16c09df12aae5ba7880708
> hooks: always include HG_PENDING
> 
> Previously we would only include HG_PENDING in the hook args if the
> transaction's writepending() actually wrote something. This is a bad criteria,
> since it's possible that a previous call to writepending() wrote stuff and the
> hooks want to still see that.
> 
> The solution is to always have hooks execute within the scope of the pending
> changes by always putting HG_PENDING in the environment.
> 
> diff --git a/mercurial/hook.py b/mercurial/hook.py
> --- a/mercurial/hook.py
> +++ b/mercurial/hook.py
> @@ -122,7 +122,8 @@ def _exthook(ui, repo, name, cmd, args, 
>      # make in-memory changes visible to external process
>      tr = repo.currenttransaction()
>      repo.dirstate.write(tr)
> -    if tr and tr.writepending():
> +    if tr:
> +        tr.writepending()
>          env['HG_PENDING'] = repo.root
>  
>      for k, v in args.iteritems():

'transaction.writepending()' saves whether there is any pending data
or not into 'self._anypending'.

    https://selenic.com/repo/hg/file/e5a1df51bb25/mercurial/transaction.py#l350

    @active
    def writepending(self):
        '''write pending file to temporary version

        This is used to allow hooks to view a transaction before commit'''
        categories = sorted(self._pendingcallback)
        for cat in categories:
            # remove callback since the data will have been flushed
            any = self._pendingcallback.pop(cat)(self)
            self._anypending = self._anypending or any
        self._anypending |= self._generatefiles(suffix='.pending')
        return self._anypending

It should return True as expected, if pending data has been written out
once.

The root cause of this issue seems that 'transaction.writepending()'
removes pending callback invoked once, even if pending callback didn't
actually write changes out yet.

This removal of callbacks is reasonable, if it is assumed that pending
callback may not be "idempotent" (e.g. writing into the file opened
with "append" mode).

But on the other hand, this removal may cause similar issue in other
code paths = other hook combinations (or introducing new hooks).

To avoid such similar issues:

  - invoke 'changelog.delayupdate()' explicitly before
    'transaction.writepending()' for external hook invocation, like
    'dirstate.write(tr)' above

    e.g. in _exthook()@hook.py

        tr = repo.currenttransaction()
        repo.dirstate.write(tr)
        if tr:
            repo.changelog.delayupdate(tr)
            if tr.writepending()
                env['HG_PENDING'] = repo.root

  - invoke 'changelog.delayupdate()' internally in changelog by
    overriding functions below, which open files in 'w'/'a' mode by
    'self.opener'

    - revlog.addgroup
    - revlog.addrevision
    - revlog.checkinlinesize

    e.g.:

    def addrevision(self, text, transaction, link, p1, p2, cachedelta=None,
                    node=None)
        self.delayupdate(transaction)
        return super(changelog, self).addrevision(.....)

The former still needs 'changelog.delayupdate()' before starting
changes to setup opener (and other internal status) of changelog, but
the latter doesn't.

I think the latter is reasonable. (yes, the former is just for
comparison :-)) How about these ?


BTW, #1 of this series causes that HG_PENDING is always passed to
external hooks while transaction running, even if there is no pending
data. Is this reasonable ? I think that this should be backed out to
avoid unintentional (trial) accessing to pending files.


> diff --git a/tests/test-hook.t b/tests/test-hook.t
> --- a/tests/test-hook.t
> +++ b/tests/test-hook.t
> @@ -113,7 +113,7 @@ test generic hooks
>    $ hg pull ../a
>    pulling from ../a
>    searching for changes
> -  prechangegroup hook: HG_SOURCE=pull HG_TXNID=TXN:* HG_URL=file:$TESTTMP/a (glob)
> +  prechangegroup hook: HG_PENDING=$TESTTMP/b HG_SOURCE=pull HG_TXNID=TXN:* HG_URL=file:$TESTTMP/a (glob)
>    adding changesets
>    adding manifests
>    adding file changes
> @@ -272,7 +272,7 @@ test that prepushkey can prevent incomin
>    listkeys hook: HG_NAMESPACE=bookmarks HG_VALUES={'bar': '0000000000000000000000000000000000000000', 'foo': '0000000000000000000000000000000000000000'}
>    no changes found
>    pretxnopen hook: HG_TXNID=TXN:* HG_TXNNAME=push (glob)
> -  prepushkey.forbid hook: HG_BUNDLE2=1 HG_KEY=baz HG_NAMESPACE=bookmarks HG_NEW=0000000000000000000000000000000000000000 HG_SOURCE=push HG_TXNID=TXN:* HG_URL=push (glob)
> +  prepushkey.forbid hook: HG_BUNDLE2=1 HG_KEY=baz HG_NAMESPACE=bookmarks HG_NEW=0000000000000000000000000000000000000000 HG_PENDING=$TESTTMP/a HG_SOURCE=push HG_TXNID=TXN:* HG_URL=push (glob)
>    pushkey-abort: prepushkey hook exited with status 1
>    abort: exporting bookmark baz failed!
>    [255]
> @@ -306,7 +306,7 @@ prechangegroup hook can prevent incoming
>    $ hg pull ../a
>    pulling from ../a
>    searching for changes
> -  prechangegroup.forbid hook: HG_SOURCE=pull HG_TXNID=TXN:* HG_URL=file:$TESTTMP/a (glob)
> +  prechangegroup.forbid hook: HG_PENDING=$TESTTMP/b HG_SOURCE=pull HG_TXNID=TXN:* HG_URL=file:$TESTTMP/a (glob)
>    abort: prechangegroup.forbid hook exited with status 1
>    [255]
>  
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> https://selenic.com/mailman/listinfo/mercurial-devel

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Durham Goode - Nov. 5, 2015, 1:11 a.m.
On 11/4/15 4:26 PM, FUJIWARA Katsunori wrote:
> At Tue, 3 Nov 2015 17:19:37 -0800,
> Durham Goode wrote:
>> # HG changeset patch
>> # User Durham Goode <durham@fb.com>
>> # Date 1446598693 28800
>> #      Tue Nov 03 16:58:13 2015 -0800
>> # Branch stable
>> # Node ID 0a772553285d61aeea3226f74698223b09fae7ac
>> # Parent  58b7f3e93bbab749ab16c09df12aae5ba7880708
>> hooks: always include HG_PENDING
>>
>> Previously we would only include HG_PENDING in the hook args if the
>> transaction's writepending() actually wrote something. This is a bad criteria,
>> since it's possible that a previous call to writepending() wrote stuff and the
>> hooks want to still see that.
>>
>> The solution is to always have hooks execute within the scope of the pending
>> changes by always putting HG_PENDING in the environment.
>>
>> diff --git a/mercurial/hook.py b/mercurial/hook.py
>> --- a/mercurial/hook.py
>> +++ b/mercurial/hook.py
>> @@ -122,7 +122,8 @@ def _exthook(ui, repo, name, cmd, args,
>>       # make in-memory changes visible to external process
>>       tr = repo.currenttransaction()
>>       repo.dirstate.write(tr)
>> -    if tr and tr.writepending():
>> +    if tr:
>> +        tr.writepending()
>>           env['HG_PENDING'] = repo.root
>>   
>>       for k, v in args.iteritems():
> 'transaction.writepending()' saves whether there is any pending data
> or not into 'self._anypending'.
>
>      https://selenic.com/repo/hg/file/e5a1df51bb25/mercurial/transaction.py#l350
>
>      @active
>      def writepending(self):
>          '''write pending file to temporary version
>
>          This is used to allow hooks to view a transaction before commit'''
>          categories = sorted(self._pendingcallback)
>          for cat in categories:
>              # remove callback since the data will have been flushed
>              any = self._pendingcallback.pop(cat)(self)
>              self._anypending = self._anypending or any
>          self._anypending |= self._generatefiles(suffix='.pending')
>          return self._anypending
>
> It should return True as expected, if pending data has been written out
> once.
>
> The root cause of this issue seems that 'transaction.writepending()'
> removes pending callback invoked once, even if pending callback didn't
> actually write changes out yet.
>
> This removal of callbacks is reasonable, if it is assumed that pending
> callback may not be "idempotent" (e.g. writing into the file opened
> with "append" mode).
>
> But on the other hand, this removal may cause similar issue in other
> code paths = other hook combinations (or introducing new hooks).
>
> To avoid such similar issues:
>
>    - invoke 'changelog.delayupdate()' explicitly before
>      'transaction.writepending()' for external hook invocation, like
>      'dirstate.write(tr)' above
>
>      e.g. in _exthook()@hook.py
>
>          tr = repo.currenttransaction()
>          repo.dirstate.write(tr)
>          if tr:
>              repo.changelog.delayupdate(tr)
>              if tr.writepending()
>                  env['HG_PENDING'] = repo.root
>
>    - invoke 'changelog.delayupdate()' internally in changelog by
>      overriding functions below, which open files in 'w'/'a' mode by
>      'self.opener'
>
>      - revlog.addgroup
>      - revlog.addrevision
>      - revlog.checkinlinesize
>
>      e.g.:
>
>      def addrevision(self, text, transaction, link, p1, p2, cachedelta=None,
>                      node=None)
>          self.delayupdate(transaction)
>          return super(changelog, self).addrevision(.....)
>
> The former still needs 'changelog.delayupdate()' before starting
> changes to setup opener (and other internal status) of changelog, but
> the latter doesn't.
>
> I think the latter is reasonable. (yes, the former is just for
> comparison :-)) How about these ?
I think your second proposal sounds reasonable.  And will catch any 
cases that are currently missing it.
>
>
> BTW, #1 of this series causes that HG_PENDING is always passed to
> external hooks while transaction running, even if there is no pending
> data. Is this reasonable ? I think that this should be backed out to
> avoid unintentional (trial) accessing to pending files.
Wouldn't we always want hooks to be able to see the current state of the 
repo? This was originally added because there was a test failure if I 
didn't add it.  I must've changed my implementation of #2 because now 
there is no test failure if I exclude #1.  So I'm ok backing out #1 if 
you think we should.
Durham Goode - Nov. 5, 2015, 5 a.m.
On 11/4/15 5:11 PM, Durham Goode wrote:
>
>
> On 11/4/15 4:26 PM, FUJIWARA Katsunori wrote:
>> At Tue, 3 Nov 2015 17:19:37 -0800,
>> Durham Goode wrote:
>>> # HG changeset patch
>>> # User Durham Goode <durham@fb.com>
>>> # Date 1446598693 28800
>>> #      Tue Nov 03 16:58:13 2015 -0800
>>> # Branch stable
>>> # Node ID 0a772553285d61aeea3226f74698223b09fae7ac
>>> # Parent  58b7f3e93bbab749ab16c09df12aae5ba7880708
>>> hooks: always include HG_PENDING
>>>
>>> Previously we would only include HG_PENDING in the hook args if the
>>> transaction's writepending() actually wrote something. This is a bad 
>>> criteria,
>>> since it's possible that a previous call to writepending() wrote 
>>> stuff and the
>>> hooks want to still see that.
>>>
>>> The solution is to always have hooks execute within the scope of the 
>>> pending
>>> changes by always putting HG_PENDING in the environment.
Matt and foozy, on a related topic:

The change to unify the HG_PENDING stuff to hooks._exthook() also means 
that tr.writepending() is not called and HG_PENDING is not set for 
python hooks. This broke one of our python hooks which drives our other 
hooks.  Matt, I know the python hook api provides no guarantees, but 
this could break any other python hooks that shell out.  So just fyi at 
the very least.
Katsunori FUJIWARA - Nov. 5, 2015, 7:18 p.m.
At Wed, 4 Nov 2015 17:11:12 -0800,
Durham Goode wrote:
> 
> On 11/4/15 4:26 PM, FUJIWARA Katsunori wrote:
> > At Tue, 3 Nov 2015 17:19:37 -0800,
> > Durham Goode wrote:
> >> # HG changeset patch
> >> # User Durham Goode <durham@fb.com>
> >> # Date 1446598693 28800
> >> #      Tue Nov 03 16:58:13 2015 -0800
> >> # Branch stable
> >> # Node ID 0a772553285d61aeea3226f74698223b09fae7ac
> >> # Parent  58b7f3e93bbab749ab16c09df12aae5ba7880708
> >> hooks: always include HG_PENDING
> >>
> >> Previously we would only include HG_PENDING in the hook args if the
> >> transaction's writepending() actually wrote something. This is a bad criteria,
> >> since it's possible that a previous call to writepending() wrote stuff and the
> >> hooks want to still see that.
> >>
> >> The solution is to always have hooks execute within the scope of the pending
> >> changes by always putting HG_PENDING in the environment.
> >>
> >> diff --git a/mercurial/hook.py b/mercurial/hook.py
> >> --- a/mercurial/hook.py
> >> +++ b/mercurial/hook.py
> >> @@ -122,7 +122,8 @@ def _exthook(ui, repo, name, cmd, args,
> >>       # make in-memory changes visible to external process
> >>       tr = repo.currenttransaction()
> >>       repo.dirstate.write(tr)
> >> -    if tr and tr.writepending():
> >> +    if tr:
> >> +        tr.writepending()
> >>           env['HG_PENDING'] = repo.root
> >>   
> >>       for k, v in args.iteritems():
> > 'transaction.writepending()' saves whether there is any pending data
> > or not into 'self._anypending'.
> >
> >      https://selenic.com/repo/hg/file/e5a1df51bb25/mercurial/transaction.py#l350
> >
> >      @active
> >      def writepending(self):
> >          '''write pending file to temporary version
> >
> >          This is used to allow hooks to view a transaction before commit'''
> >          categories = sorted(self._pendingcallback)
> >          for cat in categories:
> >              # remove callback since the data will have been flushed
> >              any = self._pendingcallback.pop(cat)(self)
> >              self._anypending = self._anypending or any
> >          self._anypending |= self._generatefiles(suffix='.pending')
> >          return self._anypending
> >
> > It should return True as expected, if pending data has been written out
> > once.
> >
> > The root cause of this issue seems that 'transaction.writepending()'
> > removes pending callback invoked once, even if pending callback didn't
> > actually write changes out yet.
> >
> > This removal of callbacks is reasonable, if it is assumed that pending
> > callback may not be "idempotent" (e.g. writing into the file opened
> > with "append" mode).
> >
> > But on the other hand, this removal may cause similar issue in other
> > code paths = other hook combinations (or introducing new hooks).
> >
> > To avoid such similar issues:
> >
> >    - invoke 'changelog.delayupdate()' explicitly before
> >      'transaction.writepending()' for external hook invocation, like
> >      'dirstate.write(tr)' above
> >
> >      e.g. in _exthook()@hook.py
> >
> >          tr = repo.currenttransaction()
> >          repo.dirstate.write(tr)
> >          if tr:
> >              repo.changelog.delayupdate(tr)
> >              if tr.writepending()
> >                  env['HG_PENDING'] = repo.root
> >
> >    - invoke 'changelog.delayupdate()' internally in changelog by
> >      overriding functions below, which open files in 'w'/'a' mode by
> >      'self.opener'
> >
> >      - revlog.addgroup
> >      - revlog.addrevision
> >      - revlog.checkinlinesize
> >
> >      e.g.:
> >
> >      def addrevision(self, text, transaction, link, p1, p2, cachedelta=None,
> >                      node=None)
> >          self.delayupdate(transaction)
> >          return super(changelog, self).addrevision(.....)
> >
> > The former still needs 'changelog.delayupdate()' before starting
> > changes to setup opener (and other internal status) of changelog, but
> > the latter doesn't.
> >
> > I think the latter is reasonable. (yes, the former is just for
> > comparison :-)) How about these ?
> I think your second proposal sounds reasonable.  And will catch any 
> cases that are currently missing it.
> >
> >
> > BTW, #1 of this series causes that HG_PENDING is always passed to
> > external hooks while transaction running, even if there is no pending
> > data. Is this reasonable ? I think that this should be backed out to
> > avoid unintentional (trial) accessing to pending files.

> Wouldn't we always want hooks to be able to see the current state of the 
> repo? This was originally added because there was a test failure if I 
> didn't add it.  I must've changed my implementation of #2 because now 
> there is no test failure if I exclude #1.  So I'm ok backing out #1 if 
> you think we should.
> 

At first, as far as I confirmed, #2 can fix issue4934 (= test newly
added in #2 can run successfully) even without #1.

The root cause of issue4934 is that 'transaction.writepending()'
doesn't imply 'changelog._writepending()', which writes buffered
changes out, at the second hook invocation. Lack of HG_PENDING in this
case is just a side effect of it.

Therefore, it should be specification matter whether HG_PENDING can be
defined always, IMHO.

Pierre-Yves, does your describing "HG_PENDING must -only- occured in
the context of a transaction" mean that HG_PENDING can be defined even
if there is no pending data yet while transaction running ? (I think
it doesn't)

    https://www.mercurial-scm.org/wiki/DirstateTransactionPlan#Detail_about_HG_PENDING_mechanism

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Katsunori FUJIWARA - Nov. 5, 2015, 7:19 p.m.
At Wed, 4 Nov 2015 21:00:30 -0800,
Durham Goode wrote:
> 
> 
> 
> On 11/4/15 5:11 PM, Durham Goode wrote:
> >
> >
> > On 11/4/15 4:26 PM, FUJIWARA Katsunori wrote:
> >> At Tue, 3 Nov 2015 17:19:37 -0800,
> >> Durham Goode wrote:
> >>> # HG changeset patch
> >>> # User Durham Goode <durham@fb.com>
> >>> # Date 1446598693 28800
> >>> #      Tue Nov 03 16:58:13 2015 -0800
> >>> # Branch stable
> >>> # Node ID 0a772553285d61aeea3226f74698223b09fae7ac
> >>> # Parent  58b7f3e93bbab749ab16c09df12aae5ba7880708
> >>> hooks: always include HG_PENDING
> >>>
> >>> Previously we would only include HG_PENDING in the hook args if the
> >>> transaction's writepending() actually wrote something. This is a bad 
> >>> criteria,
> >>> since it's possible that a previous call to writepending() wrote 
> >>> stuff and the
> >>> hooks want to still see that.
> >>>
> >>> The solution is to always have hooks execute within the scope of the 
> >>> pending
> >>> changes by always putting HG_PENDING in the environment.

> Matt and foozy, on a related topic:
> 
> The change to unify the HG_PENDING stuff to hooks._exthook() also means 
> that tr.writepending() is not called and HG_PENDING is not set for 
> python hooks. This broke one of our python hooks which drives our other 
> hooks.  Matt, I know the python hook api provides no guarantees, but 
> this could break any other python hooks that shell out.  So just fyi at 
> the very least.
> 

If I understand correctly, your python hook does a kind of "stringify"
on each hook arguments explicitly like 'hook._exthook()', to get the
value of HG_PENDING for external process before 520defbc0335, doesn't it ?

    https://selenic.com/repo/hg/file/e7c618cee8df/mercurial/hook.py#l129

    for k, v in args.iteritems():
        if callable(v):
            v = v()
        if isinstance(v, dict):
            # make the dictionary element order stable across Python
            # implementations
            v = ('{' +
                 ', '.join('%r: %r' % i for i in sorted(v.iteritems())) +
                 '}')
        env['HG_' + k.upper()] = v

How about factoring this building environment variables (+ adding
HG_PENDING introduced by 520defbc0335) out from 'hook._exthook()' as a
utility for python hooks, which spawn external process ?

This can:

  - hide (1) internal "stringify" logic for environment variables and
    (2) logic to make in-memory changes (not only changelog, but also
    dirstate) visible to external process

  - provide the convenient way to get appropriate environment
    variables for external process

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Pierre-Yves David - Nov. 6, 2015, 4:16 p.m.
On 11/05/2015 02:18 PM, FUJIWARA Katsunori wrote:
> At Wed, 4 Nov 2015 17:11:12 -0800,
> Durham Goode wrote:
>>
>> On 11/4/15 4:26 PM, FUJIWARA Katsunori wrote:
>>> At Tue, 3 Nov 2015 17:19:37 -0800,
>>> Durham Goode wrote:
>>>> # HG changeset patch
>>>> # User Durham Goode <durham@fb.com>
>>>> # Date 1446598693 28800
>>>> #      Tue Nov 03 16:58:13 2015 -0800
>>>> # Branch stable
>>>> # Node ID 0a772553285d61aeea3226f74698223b09fae7ac
>>>> # Parent  58b7f3e93bbab749ab16c09df12aae5ba7880708
>>>> hooks: always include HG_PENDING
>>>>
>>>> Previously we would only include HG_PENDING in the hook args if the
>>>> transaction's writepending() actually wrote something. This is a bad criteria,
>>>> since it's possible that a previous call to writepending() wrote stuff and the
>>>> hooks want to still see that.
>>>>
>>>> The solution is to always have hooks execute within the scope of the pending
>>>> changes by always putting HG_PENDING in the environment.
>>>>
>>>> diff --git a/mercurial/hook.py b/mercurial/hook.py
>>>> --- a/mercurial/hook.py
>>>> +++ b/mercurial/hook.py
>>>> @@ -122,7 +122,8 @@ def _exthook(ui, repo, name, cmd, args,
>>>>        # make in-memory changes visible to external process
>>>>        tr = repo.currenttransaction()
>>>>        repo.dirstate.write(tr)
>>>> -    if tr and tr.writepending():
>>>> +    if tr:
>>>> +        tr.writepending()
>>>>            env['HG_PENDING'] = repo.root
>>>>
>>>>        for k, v in args.iteritems():
>>> 'transaction.writepending()' saves whether there is any pending data
>>> or not into 'self._anypending'.
>>>
>>>       https://selenic.com/repo/hg/file/e5a1df51bb25/mercurial/transaction.py#l350
>>>
>>>       @active
>>>       def writepending(self):
>>>           '''write pending file to temporary version
>>>
>>>           This is used to allow hooks to view a transaction before commit'''
>>>           categories = sorted(self._pendingcallback)
>>>           for cat in categories:
>>>               # remove callback since the data will have been flushed
>>>               any = self._pendingcallback.pop(cat)(self)
>>>               self._anypending = self._anypending or any
>>>           self._anypending |= self._generatefiles(suffix='.pending')
>>>           return self._anypending
>>>
>>> It should return True as expected, if pending data has been written out
>>> once.
>>>
>>> The root cause of this issue seems that 'transaction.writepending()'
>>> removes pending callback invoked once, even if pending callback didn't
>>> actually write changes out yet.
>>>
>>> This removal of callbacks is reasonable, if it is assumed that pending
>>> callback may not be "idempotent" (e.g. writing into the file opened
>>> with "append" mode).
>>>
>>> But on the other hand, this removal may cause similar issue in other
>>> code paths = other hook combinations (or introducing new hooks).
>>>
>>> To avoid such similar issues:
>>>
>>>     - invoke 'changelog.delayupdate()' explicitly before
>>>       'transaction.writepending()' for external hook invocation, like
>>>       'dirstate.write(tr)' above
>>>
>>>       e.g. in _exthook()@hook.py
>>>
>>>           tr = repo.currenttransaction()
>>>           repo.dirstate.write(tr)
>>>           if tr:
>>>               repo.changelog.delayupdate(tr)
>>>               if tr.writepending()
>>>                   env['HG_PENDING'] = repo.root
>>>
>>>     - invoke 'changelog.delayupdate()' internally in changelog by
>>>       overriding functions below, which open files in 'w'/'a' mode by
>>>       'self.opener'
>>>
>>>       - revlog.addgroup
>>>       - revlog.addrevision
>>>       - revlog.checkinlinesize
>>>
>>>       e.g.:
>>>
>>>       def addrevision(self, text, transaction, link, p1, p2, cachedelta=None,
>>>                       node=None)
>>>           self.delayupdate(transaction)
>>>           return super(changelog, self).addrevision(.....)
>>>
>>> The former still needs 'changelog.delayupdate()' before starting
>>> changes to setup opener (and other internal status) of changelog, but
>>> the latter doesn't.
>>>
>>> I think the latter is reasonable. (yes, the former is just for
>>> comparison :-)) How about these ?
>> I think your second proposal sounds reasonable.  And will catch any
>> cases that are currently missing it.
>>>
>>>
>>> BTW, #1 of this series causes that HG_PENDING is always passed to
>>> external hooks while transaction running, even if there is no pending
>>> data. Is this reasonable ? I think that this should be backed out to
>>> avoid unintentional (trial) accessing to pending files.
>
>> Wouldn't we always want hooks to be able to see the current state of the
>> repo? This was originally added because there was a test failure if I
>> didn't add it.  I must've changed my implementation of #2 because now
>> there is no test failure if I exclude #1.  So I'm ok backing out #1 if
>> you think we should.
>
> At first, as far as I confirmed, #2 can fix issue4934 (= test newly
> added in #2 can run successfully) even without #1.
>
> The root cause of issue4934 is that 'transaction.writepending()'
> doesn't imply 'changelog._writepending()', which writes buffered
> changes out, at the second hook invocation. Lack of HG_PENDING in this
> case is just a side effect of it.
>
> Therefore, it should be specification matter whether HG_PENDING can be
> defined always, IMHO.
>
> Pierre-Yves, does your describing "HG_PENDING must -only- occured in
> the context of a transaction" mean that HG_PENDING can be defined even
> if there is no pending data yet while transaction running ? (I think
> it doesn't)
>
>      https://www.mercurial-scm.org/wiki/DirstateTransactionPlan#Detail_about_HG_PENDING_mechanism

I confirm that #1 seems wrong to me:

- writepending will is already properly reporting "True" for all call 
once something happened

- asking hooks to read the "pending" data when non-exist is superfluous 
and a change in behavior.

I've pushed a backout of #1 to the clowncopter.

I'm currently looking at #2 and related discussion.

Patch

diff --git a/mercurial/hook.py b/mercurial/hook.py
--- a/mercurial/hook.py
+++ b/mercurial/hook.py
@@ -122,7 +122,8 @@  def _exthook(ui, repo, name, cmd, args, 
     # make in-memory changes visible to external process
     tr = repo.currenttransaction()
     repo.dirstate.write(tr)
-    if tr and tr.writepending():
+    if tr:
+        tr.writepending()
         env['HG_PENDING'] = repo.root
 
     for k, v in args.iteritems():
diff --git a/tests/test-hook.t b/tests/test-hook.t
--- a/tests/test-hook.t
+++ b/tests/test-hook.t
@@ -113,7 +113,7 @@  test generic hooks
   $ hg pull ../a
   pulling from ../a
   searching for changes
-  prechangegroup hook: HG_SOURCE=pull HG_TXNID=TXN:* HG_URL=file:$TESTTMP/a (glob)
+  prechangegroup hook: HG_PENDING=$TESTTMP/b HG_SOURCE=pull HG_TXNID=TXN:* HG_URL=file:$TESTTMP/a (glob)
   adding changesets
   adding manifests
   adding file changes
@@ -272,7 +272,7 @@  test that prepushkey can prevent incomin
   listkeys hook: HG_NAMESPACE=bookmarks HG_VALUES={'bar': '0000000000000000000000000000000000000000', 'foo': '0000000000000000000000000000000000000000'}
   no changes found
   pretxnopen hook: HG_TXNID=TXN:* HG_TXNNAME=push (glob)
-  prepushkey.forbid hook: HG_BUNDLE2=1 HG_KEY=baz HG_NAMESPACE=bookmarks HG_NEW=0000000000000000000000000000000000000000 HG_SOURCE=push HG_TXNID=TXN:* HG_URL=push (glob)
+  prepushkey.forbid hook: HG_BUNDLE2=1 HG_KEY=baz HG_NAMESPACE=bookmarks HG_NEW=0000000000000000000000000000000000000000 HG_PENDING=$TESTTMP/a HG_SOURCE=push HG_TXNID=TXN:* HG_URL=push (glob)
   pushkey-abort: prepushkey hook exited with status 1
   abort: exporting bookmark baz failed!
   [255]
@@ -306,7 +306,7 @@  prechangegroup hook can prevent incoming
   $ hg pull ../a
   pulling from ../a
   searching for changes
-  prechangegroup.forbid hook: HG_SOURCE=pull HG_TXNID=TXN:* HG_URL=file:$TESTTMP/a (glob)
+  prechangegroup.forbid hook: HG_PENDING=$TESTTMP/b HG_SOURCE=pull HG_TXNID=TXN:* HG_URL=file:$TESTTMP/a (glob)
   abort: prechangegroup.forbid hook exited with status 1
   [255]