Patchwork [V2] transaction: allow running file generators after finalizers

login
register
mail settings
Submitter Durham Goode
Date March 20, 2016, 5:49 p.m.
Message ID <1431f0447a6b0be851f6.1458496171@dev8486.prn1.facebook.com>
Download mbox | patch
Permalink /patch/13982/
State Superseded
Commit a5009789960caf68dd669ae0773f43500f44a9f6
Headers show

Comments

Durham Goode - March 20, 2016, 5:49 p.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1458495989 25200
#      Sun Mar 20 10:46:29 2016 -0700
# Node ID 1431f0447a6b0be851f609b386fabf83b2f54666
# Parent  ed75909c4c670a7d9db4a2bef9817a0d5f0b4d9c
transaction: allow running file generators after finalizers

Previously, transaction.close would run the file generators before running the
finalizers (see the list below for what is in each). Since file generators
contain the bookmarks and the dirstate, this meant we made the dirstate and
bookmarks visible to external readers before we actually wrote the commits into
the changelog, which could result in missing bookmarks and missing working copy
parents (especially on servers with high commit throughput, since pulls might
fail to see certain bookmarks in this situation).

By moving the changelog writing to be before the bookmark/dirstate writing, we
ensure the commits are present before they are referenced.

This implementation allows certain file generators to be after the finalizers.
We didn't want to move all of the generators, since it's important that things
like phases actually run before the finalizers (otherwise you could expose
commits as public when they really shouldn't be).

For reference, file generators currently consist of: bookmarks, dirstate, and
phases. Finalizers currently consist of: changelog, revbranchcache, and fncache.
Pierre-Yves David - March 27, 2016, 5:15 a.m.
On 03/20/2016 10:49 AM, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1458495989 25200
> #      Sun Mar 20 10:46:29 2016 -0700
> # Node ID 1431f0447a6b0be851f609b386fabf83b2f54666
> # Parent  ed75909c4c670a7d9db4a2bef9817a0d5f0b4d9c
> transaction: allow running file generators after finalizers

The general logic on this patch is good. But the actual changes to file 
generation function seems a bit too fragile to me. See my comment in 
context.


> […]
> @@ -276,12 +284,17 @@ class transaction(object):
>           # but for bookmarks that are handled outside this mechanism.
>           self._filegenerators[genid] = (order, filenames, genfunc, location)
>
> -    def _generatefiles(self, suffix=''):
> +    def _generatefiles(self, postfinalize=False, suffix=''):

The postfinalize=False as the default is an issue here,

Before this patch tr._generatefiles() would get all file generated. 
After that patch, only part of them are. Third parties and future 
callers are going to get this wrong and introduce various transaction 
bug. This is highlighted by the fact they had to be a special case when 
'suffix' is passed.

I think we need three values here: 'all', 'precl', 'postcl':

   def _generatefiles(self, suffix='', group='all')

This is a small change but I think this is a critical enough code that 
it deserved to be expurged from surprise.


Also, we should try to avoid changing the position of arguments, this 
gratuitously break other callers.

>           # write files registered for generation
>           any = False
> -        for entry in sorted(self._filegenerators.values()):
> +        for id, entry in sorted(self._filegenerators.iteritems()):
>               any = True
>               order, filenames, genfunc, location = entry
> +
> +            # for generation at closing, check if it's before or after finalize
> +            if not suffix and (id in postfinalizegenerators) != (postfinalize):
> +                continue
> +

[above, the suffix "hack" I was referring too]

> […]

Cheers,
Katsunori FUJIWARA - March 29, 2016, 9:14 a.m.
At Sat, 26 Mar 2016 22:15:50 -0700,
Pierre-Yves David wrote:
> 
> On 03/20/2016 10:49 AM, Durham Goode wrote:
> > # HG changeset patch
> > # User Durham Goode <durham@fb.com>
> > # Date 1458495989 25200
> > #      Sun Mar 20 10:46:29 2016 -0700
> > # Node ID 1431f0447a6b0be851f609b386fabf83b2f54666
> > # Parent  ed75909c4c670a7d9db4a2bef9817a0d5f0b4d9c
> > transaction: allow running file generators after finalizers

[...]

> > […]
> > @@ -276,12 +284,17 @@ class transaction(object):
> >           # but for bookmarks that are handled outside this mechanism.
> >           self._filegenerators[genid] = (order, filenames, genfunc, location)
> >
> > -    def _generatefiles(self, suffix=''):
> > +    def _generatefiles(self, postfinalize=False, suffix=''):
> 
> The postfinalize=False as the default is an issue here,
> 
> Before this patch tr._generatefiles() would get all file generated. 
> After that patch, only part of them are. Third parties and future 
> callers are going to get this wrong and introduce various transaction 
> bug. This is highlighted by the fact they had to be a special case when 
> 'suffix' is passed.
> 
> I think we need three values here: 'all', 'precl', 'postcl':
> 
>    def _generatefiles(self, suffix='', group='all')

If possible, wording without abbreviation is friendly to
non-English-native developer, like me :-)

If "preclose"/"postclose" is long, how about "closing"/"closed" ?


> This is a small change but I think this is a critical enough code that 
> it deserved to be expurged from surprise.
> 
> 
> Also, we should try to avoid changing the position of arguments, this 
> gratuitously break other callers.
> 
> >           # write files registered for generation
> >           any = False
> > -        for entry in sorted(self._filegenerators.values()):
> > +        for id, entry in sorted(self._filegenerators.iteritems()):
> >               any = True
> >               order, filenames, genfunc, location = entry
> > +
> > +            # for generation at closing, check if it's before or after finalize
> > +            if not suffix and (id in postfinalizegenerators) != (postfinalize):
> > +                continue
> > +
> 
> [above, the suffix "hack" I was referring too]
> 
> > […]
> 
> Cheers,
> 
> -- 
> Pierre-Yves David
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Katsunori FUJIWARA - March 29, 2016, 8:30 p.m.
At Tue, 29 Mar 2016 18:14:42 +0900,
FUJIWARA Katsunori wrote:
> 
> At Sat, 26 Mar 2016 22:15:50 -0700,
> Pierre-Yves David wrote:
> > 
> > On 03/20/2016 10:49 AM, Durham Goode wrote:
> > > # HG changeset patch
> > > # User Durham Goode <durham@fb.com>
> > > # Date 1458495989 25200
> > > #      Sun Mar 20 10:46:29 2016 -0700
> > > # Node ID 1431f0447a6b0be851f609b386fabf83b2f54666
> > > # Parent  ed75909c4c670a7d9db4a2bef9817a0d5f0b4d9c
> > > transaction: allow running file generators after finalizers
> 
> [...]
> 
> > > […]
> > > @@ -276,12 +284,17 @@ class transaction(object):
> > >           # but for bookmarks that are handled outside this mechanism.
> > >           self._filegenerators[genid] = (order, filenames, genfunc, location)
> > >
> > > -    def _generatefiles(self, suffix=''):
> > > +    def _generatefiles(self, postfinalize=False, suffix=''):
> > 
> > The postfinalize=False as the default is an issue here,
> > 
> > Before this patch tr._generatefiles() would get all file generated. 
> > After that patch, only part of them are. Third parties and future 
> > callers are going to get this wrong and introduce various transaction 
> > bug. This is highlighted by the fact they had to be a special case when 
> > 'suffix' is passed.
> > 
> > I think we need three values here: 'all', 'precl', 'postcl':
> > 
> >    def _generatefiles(self, suffix='', group='all')
> 
> If possible, wording without abbreviation is friendly to
> non-English-native developer, like me :-)
> 
> If "preclose"/"postclose" is long, how about "closing"/"closed" ?

I remembered that "cl" is often used as an abbreviation of "changelog"
in Mercurial source. Would you mean pre/post-changelog(-finalization) ?


> > This is a small change but I think this is a critical enough code that 
> > it deserved to be expurged from surprise.
> > 
> > 
> > Also, we should try to avoid changing the position of arguments, this 
> > gratuitously break other callers.
> > 
> > >           # write files registered for generation
> > >           any = False
> > > -        for entry in sorted(self._filegenerators.values()):
> > > +        for id, entry in sorted(self._filegenerators.iteritems()):
> > >               any = True
> > >               order, filenames, genfunc, location = entry
> > > +
> > > +            # for generation at closing, check if it's before or after finalize
> > > +            if not suffix and (id in postfinalizegenerators) != (postfinalize):
> > > +                continue
> > > +
> > 
> > [above, the suffix "hack" I was referring too]
> > 
> > > […]
> > 
> > Cheers,
> > 
> > -- 
> > Pierre-Yves David
> > _______________________________________________
> > Mercurial-devel mailing list
> > Mercurial-devel@mercurial-scm.org
> > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
> 
> ----------------------------------------------------------------------
> [FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Pierre-Yves David - March 29, 2016, 8:42 p.m.
On 03/29/2016 01:30 PM, FUJIWARA Katsunori wrote:
> At Tue, 29 Mar 2016 18:14:42 +0900,
> FUJIWARA Katsunori wrote:
>>
>> At Sat, 26 Mar 2016 22:15:50 -0700,
>> Pierre-Yves David wrote:
>>>
>>> On 03/20/2016 10:49 AM, Durham Goode wrote:
>>>> # HG changeset patch
>>>> # User Durham Goode <durham@fb.com>
>>>> # Date 1458495989 25200
>>>> #      Sun Mar 20 10:46:29 2016 -0700
>>>> # Node ID 1431f0447a6b0be851f609b386fabf83b2f54666
>>>> # Parent  ed75909c4c670a7d9db4a2bef9817a0d5f0b4d9c
>>>> transaction: allow running file generators after finalizers
>>
>> [...]
>>
>>>> […]
>>>> @@ -276,12 +284,17 @@ class transaction(object):
>>>>            # but for bookmarks that are handled outside this mechanism.
>>>>            self._filegenerators[genid] = (order, filenames, genfunc, location)
>>>>
>>>> -    def _generatefiles(self, suffix=''):
>>>> +    def _generatefiles(self, postfinalize=False, suffix=''):
>>>
>>> The postfinalize=False as the default is an issue here,
>>>
>>> Before this patch tr._generatefiles() would get all file generated.
>>> After that patch, only part of them are. Third parties and future
>>> callers are going to get this wrong and introduce various transaction
>>> bug. This is highlighted by the fact they had to be a special case when
>>> 'suffix' is passed.
>>>
>>> I think we need three values here: 'all', 'precl', 'postcl':
>>>
>>>     def _generatefiles(self, suffix='', group='all')
>>
>> If possible, wording without abbreviation is friendly to
>> non-English-native developer, like me :-)
>>
>> If "preclose"/"postclose" is long, how about "closing"/"closed" ?
>
> I remembered that "cl" is often used as an abbreviation of "changelog"
> in Mercurial source. Would you mean pre/post-changelog(-finalization) ?

I actually did. This is kind of proving your point because your expended 
the abbreviation wrong and disprove it because you eventually got it 
wrong. But you are probably right here that explicite is better than 
abbreviated.

(also, keep in mind that I'm also are a non-native speaker, even if I'm 
kind of cheating as most of my advanced vocabulary got forcefully 
imported into the English language over the past millennial.)

Cheers,

Patch

diff --git a/mercurial/transaction.py b/mercurial/transaction.py
--- a/mercurial/transaction.py
+++ b/mercurial/transaction.py
@@ -23,6 +23,14 @@  from . import (
 
 version = 2
 
+# These are the file generators that should only be executed after the
+# finalizers are done, since they rely on the output of the finalizers (like
+# the changelog having been written).
+postfinalizegenerators = set([
+    'bookmarks',
+    'dirstate'
+])
+
 def active(func):
     def _active(self, *args, **kwds):
         if self.count == 0:
@@ -276,12 +284,17 @@  class transaction(object):
         # but for bookmarks that are handled outside this mechanism.
         self._filegenerators[genid] = (order, filenames, genfunc, location)
 
-    def _generatefiles(self, suffix=''):
+    def _generatefiles(self, postfinalize=False, suffix=''):
         # write files registered for generation
         any = False
-        for entry in sorted(self._filegenerators.values()):
+        for id, entry in sorted(self._filegenerators.iteritems()):
             any = True
             order, filenames, genfunc, location = entry
+
+            # for generation at closing, check if it's before or after finalize
+            if not suffix and (id in postfinalizegenerators) != (postfinalize):
+                continue
+
             vfs = self._vfsmap[location]
             files = []
             try:
@@ -407,10 +420,11 @@  class transaction(object):
         '''commit the transaction'''
         if self.count == 1:
             self.validator(self)  # will raise exception if needed
-            self._generatefiles()
+            self._generatefiles(postfinalize=False)
             categories = sorted(self._finalizecallback)
             for cat in categories:
                 self._finalizecallback[cat](self)
+            self._generatefiles(postfinalize=True)
 
         self.count -= 1
         if self.count != 0:
diff --git a/tests/test-bookmarks.t b/tests/test-bookmarks.t
--- a/tests/test-bookmarks.t
+++ b/tests/test-bookmarks.t
@@ -846,3 +846,52 @@  updates the working directory and curren
   6:81dcce76aa0b
   $ hg -R ../cloned-bookmarks-update bookmarks | grep ' Y '
    * Y                         6:81dcce76aa0b
+
+  $ cd ..
+
+ensure changelog is written before bookmarks
+  $ hg init orderrepo
+  $ cd orderrepo
+  $ touch a
+  $ hg commit -Aqm one
+  $ hg book mybook
+  $ echo a > a
+
+  $ cat > $TESTTMP/pausefinalize.py <<EOF
+  > from mercurial import extensions, localrepo
+  > import os, time
+  > def transaction(orig, self, desc, report=None):
+  >    tr = orig(self, desc, report)
+  >    def sleep(*args, **kwargs):
+  >        retry = 20
+  >        while retry > 0 and not os.path.exists("$TESTTMP/unpause"):
+  >            retry -= 1
+  >            time.sleep(0.5)
+  >        if os.path.exists("$TESTTMP/unpause"):
+  >            os.remove("$TESTTMP/unpause")
+  >    # It is important that this finalizer start with 'a', so it runs before
+  >    # the changelog finalizer appends to the changelog.
+  >    tr.addfinalize('a-sleep', sleep)
+  >    return tr
+  > 
+  > def extsetup(ui):
+  >    # This extension inserts an artifical pause during the transaction
+  >    # finalizer, so we can run commands mid-transaction-close.
+  >    extensions.wrapfunction(localrepo.localrepository, 'transaction',
+  >                            transaction)
+  > EOF
+  $ hg commit -qm two --config extensions.pausefinalize=$TESTTMP/pausefinalize.py &
+  $ sleep 2
+  $ hg log -r .
+  changeset:   0:867bc5792c8c
+  bookmark:    mybook
+  tag:         tip
+  user:        test
+  date:        Thu Jan 01 00:00:00 1970 +0000
+  summary:     one
+  
+  $ hg bookmarks
+   * mybook                    0:867bc5792c8c
+  $ touch $TESTTMP/unpause
+
+  $ cd ..