Patchwork transaction: move changelog finalizer to be before bookmark finalizer

login
register
mail settings
Submitter Durham Goode
Date March 17, 2016, 6:31 a.m.
Message ID <cdf4a94fdc46d27196f5.1458196310@dev8486.prn1.facebook.com>
Download mbox | patch
Permalink /patch/13918/
State Changes Requested
Headers show

Comments

Durham Goode - March 17, 2016, 6:31 a.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1458196059 25200
#      Wed Mar 16 23:27:39 2016 -0700
# Node ID cdf4a94fdc46d27196f5411fc7a4008690834fba
# Parent  ed75909c4c670a7d9db4a2bef9817a0d5f0b4d9c
transaction: move changelog finalizer to be before bookmark finalizer

Previously, transaction close would run the file generators before running the
finalizers (see the list below for what is in each). Since file generators
contain the bookmarks and the dirstate, this meant we made the dirstate and
bookmarks visible to external readers before we actually wrote the commits into
the changelog, which could result in missing bookmarks and missing working copy
parents (especially on servers with high commit throughput, since pulls might
fail to see certain bookmarks in this situation).

By moving the changelog writing to be before the bookmark/dirstate writing, we
ensure the commits are present before they are referenced.

For reference, file generators currently consist of: bookmarks, dirstate, and
phases. Finalizers currently consist of: changelog, revbranchcache, and fncache.
All of the former reference the latter, so therefore the latter should be
written first.

Technically there's still plenty of race conditions (can the order of finalizers
affect how external readers see the repo?), but this is a step forward at least.
Ryan McElroy - March 18, 2016, 6:22 p.m.
On 3/16/2016 11:31 PM, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1458196059 25200
> #      Wed Mar 16 23:27:39 2016 -0700
> # Node ID cdf4a94fdc46d27196f5411fc7a4008690834fba
> # Parent  ed75909c4c670a7d9db4a2bef9817a0d5f0b4d9c
> transaction: move changelog finalizer to be before bookmark finalizer
>
> Previously, transaction close would run the file generators before running the
> finalizers (see the list below for what is in each). Since file generators
> contain the bookmarks and the dirstate, this meant we made the dirstate and
> bookmarks visible to external readers before we actually wrote the commits into
> the changelog, which could result in missing bookmarks and missing working copy
> parents (especially on servers with high commit throughput, since pulls might
> fail to see certain bookmarks in this situation).
>
> By moving the changelog writing to be before the bookmark/dirstate writing, we
> ensure the commits are present before they are referenced.
>
> For reference, file generators currently consist of: bookmarks, dirstate, and
> phases. Finalizers currently consist of: changelog, revbranchcache, and fncache.
> All of the former reference the latter, so therefore the latter should be
> written first.
>
> Technically there's still plenty of race conditions (can the order of finalizers
> affect how external readers see the repo?), but this is a step forward at least.
>
> diff --git a/mercurial/transaction.py b/mercurial/transaction.py
> --- a/mercurial/transaction.py
> +++ b/mercurial/transaction.py
> @@ -407,10 +407,10 @@ class transaction(object):
>           '''commit the transaction'''
>           if self.count == 1:
>               self.validator(self)  # will raise exception if needed
> -            self._generatefiles()
>               categories = sorted(self._finalizecallback)
>               for cat in categories:
>                   self._finalizecallback[cat](self)
> +            self._generatefiles()
>   
>           self.count -= 1
>           if self.count != 0:
> diff --git a/tests/test-bookmarks.t b/tests/test-bookmarks.t
> --- a/tests/test-bookmarks.t
> +++ b/tests/test-bookmarks.t
> @@ -846,3 +846,52 @@ updates the working directory and curren
>     6:81dcce76aa0b
>     $ hg -R ../cloned-bookmarks-update bookmarks | grep ' Y '
>      * Y                         6:81dcce76aa0b
> +
> +  $ cd ..
> +
> +ensure changelog is written before bookmarks
> +  $ hg init orderrepo
> +  $ cd orderrepo
> +  $ touch a
> +  $ hg commit -Aqm one
> +  $ hg book mybook
> +  $ echo a > a
> +
> +  $ cat > $TESTTMP/pausefinalize.py <<EOF
> +  > from mercurial import extensions, localrepo
> +  > import os, time
> +  > def transaction(orig, self, desc, report=None):
> +  >    tr = orig(self, desc, report)
> +  >    def sleep(*args, **kwargs):
> +  >        retry = 20
> +  >        while retry > 0 and not os.path.exists("$TESTTMP/unpause"):
> +  >            retry -= 1
> +  >            time.sleep(0.5)
> +  >        if os.path.exists("$TESTTMP/unpause"):
> +  >            os.remove("$TESTTMP/unpause")
> +  >    # It is important that this finalizer start with 'a', so it runs before
> +  >    # the changelog finalizer appends to the changelog.
> +  >    tr.addfinalize('a-sleep', sleep)
> +  >    return tr
> +  >
> +  > def extsetup(ui):
> +  >    # This extension inserts an artifical pause during the transaction
> +  >    # finalizer, so we can run commands mid-transaction-close.
> +  >    extensions.wrapfunction(localrepo.localrepository, 'transaction',
> +  >                            transaction)
> +  > EOF
> +  $ hg commit -qm two --config extensions.pausefinalize=$TESTTMP/pausefinalize.py &
> +  $ sleep 2
> +  $ hg log -r .
> +  changeset:   0:867bc5792c8c
> +  bookmark:    mybook
> +  tag:         tip
> +  user:        test
> +  date:        Thu Jan 01 00:00:00 1970 +0000
> +  summary:     one
> +
> +  $ hg bookmarks
> +   * mybook                    0:867bc5792c8c
> +  $ touch $TESTTMP/unpause
> +
> +  $ cd ..
>
This patch looks minimal and safe to me. The particular race condition 
that currently exists is really bad in high-commit-rate repositories -- 
bookmarks often "go missing"altogether, which is a terrible user experience.
Pierre-Yves David - March 19, 2016, 1:07 a.m.
On 03/16/2016 11:31 PM, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1458196059 25200
> #      Wed Mar 16 23:27:39 2016 -0700
> # Node ID cdf4a94fdc46d27196f5411fc7a4008690834fba
> # Parent  ed75909c4c670a7d9db4a2bef9817a0d5f0b4d9c
> transaction: move changelog finalizer to be before bookmark finalizer
>
> Previously, transaction close would run the file generators before running the
> finalizers (see the list below for what is in each). Since file generators
> contain the bookmarks and the dirstate, this meant we made the dirstate and
> bookmarks visible to external readers before we actually wrote the commits into
> the changelog, which could result in missing bookmarks and missing working copy
> parents (especially on servers with high commit throughput, since pulls might
> fail to see certain bookmarks in this situation).
>
> By moving the changelog writing to be before the bookmark/dirstate writing, we
> ensure the commits are present before they are referenced.
>
> For reference, file generators currently consist of: bookmarks, dirstate, and
> phases. Finalizers currently consist of: changelog, revbranchcache, and fncache.
> All of the former reference the latter, so therefore the latter should be
> written first.
>
> Technically there's still plenty of race conditions (can the order of finalizers
> affect how external readers see the repo?), but this is a step forward at least.

Can we get a more complete analyse on the race condition we have here 
and which ones we are trading for which others one?

For example, this changes could result in phase root being visible after 
the changesets, making public being seen public while they are not.

I would like to have a clear idea of where we are walking there and what 
tradeoff we are performing.
Durham Goode - March 19, 2016, 5:04 p.m.
On 3/18/16 6:07 PM, Pierre-Yves David wrote:
>
> On 03/16/2016 11:31 PM, Durham Goode wrote:
>> # HG changeset patch
>> # User Durham Goode <durham@fb.com>
>> # Date 1458196059 25200
>> #      Wed Mar 16 23:27:39 2016 -0700
>> # Node ID cdf4a94fdc46d27196f5411fc7a4008690834fba
>> # Parent  ed75909c4c670a7d9db4a2bef9817a0d5f0b4d9c
>> transaction: move changelog finalizer to be before bookmark finalizer
>>
>> Previously, transaction close would run the file generators before 
>> running the
>> finalizers (see the list below for what is in each). Since file 
>> generators
>> contain the bookmarks and the dirstate, this meant we made the 
>> dirstate and
>> bookmarks visible to external readers before we actually wrote the 
>> commits into
>> the changelog, which could result in missing bookmarks and missing 
>> working copy
>> parents (especially on servers with high commit throughput, since 
>> pulls might
>> fail to see certain bookmarks in this situation).
>>
>> By moving the changelog writing to be before the bookmark/dirstate 
>> writing, we
>> ensure the commits are present before they are referenced.
>>
>> For reference, file generators currently consist of: bookmarks, 
>> dirstate, and
>> phases. Finalizers currently consist of: changelog, revbranchcache, 
>> and fncache.
>> All of the former reference the latter, so therefore the latter 
>> should be
>> written first.
>>
>> Technically there's still plenty of race conditions (can the order of 
>> finalizers
>> affect how external readers see the repo?), but this is a step 
>> forward at least.
>
> Can we get a more complete analyse on the race condition we have here 
> and which ones we are trading for which others one?
>
> For example, this changes could result in phase root being visible 
> after the changesets, making public being seen public while they are not.
>
> I would like to have a clear idea of where we are walking there and 
> what tradeoff we are performing
K, given that file generators are "bookmarks, dirstate, and phases" and 
finalizers are "revlogs, revbranchcache, fncache", the old race 
conditions are:

- bookmarks written before revlogs - causes bookmarks to appear missing 
during read
- dirstate written before revlogs - causes missing working copy parent
- phases written before revlogs - should be benign, since any phase on a 
commit that is missing only affects commits that are also missing
- bookmarks/dirstate/phases written before revbranch cache - no issues
- bookmarks/dirstate/phases written before fncache - no issues

and the new race conditions are:

- revlogs written before bookmarks - may see commits where the bookmark 
hasn't moved on top of them yet; odd but not an invalid state
- revlogs written before dirstate - no issues
- revlogs written before phases - some commits may show up with the 
wrong phase (potentially public)
- revbranch cache written before bookmarks/dirstate/phases - no issues
- fncache written before bookmarks/dirstate/phases - no issues

Race conditions that existed before (within the two categories):
- fncache before revlog - fncache could point to file that doesn't exist
- revlog before fncache - fncache could be missing a file


So in summary, my change removes the invalid bookmarks and invalid 
dirstate case, and replaces it with an out-of-date phase case.  The out 
of date phase case could potentially be bad, since if we send public 
commits to the client, it may be hard to make them draft later. On the 
flip side, most central-server setups are publishing servers, so this 
wouldn't actually be an issue for them. This could be fixed by moving 
the phase writes to be before the revlogs (or by redoing the entire 
storage system to be more widely transactional).
Pierre-Yves David - March 19, 2016, 7:30 p.m.
On 03/19/2016 10:04 AM, Durham Goode wrote:
>
> On 3/18/16 6:07 PM, Pierre-Yves David wrote:
>> I would like to have a clear idea of where we are walking there and
>> what tradeoff we are performing
> K, given that file generators are "bookmarks, dirstate, and phases" and
> finalizers are "revlogs, revbranchcache, fncache", the old race
> conditions are:
>
> - bookmarks written before revlogs - causes bookmarks to appear missing
> during read
> - dirstate written before revlogs - causes missing working copy parent

I'm not too worried about this one (well, it is annoying and I'm 
suspecting my thg is getting hit by that from time to time). However, 
server usually does not have a working copy and clients have no way to 
access it anyway.

> - phases written before revlogs - should be benign, since any phase on a
> commit that is missing only affects commits that are also missing

We could be seeing changeset as draft while they should be public. This 
can have some annoying consequence but the next pull should fix it 
silently, so it is not too bad.

(This impact people who have hooks to enforce constraint on there 
repository state, the race could breaks this guarantee)

> - bookmarks/dirstate/phases written before revbranch cache - no issues
> - bookmarks/dirstate/phases written before fncache - no issues
>
> and the new race conditions are:
>
> - revlogs written before bookmarks - may see commits where the bookmark
> hasn't moved on top of them yet; odd but not an invalid state

Less bad than the other one but still annoying:

   (This impact people who have hooks to enforce constraint on there 
repository state, the race could breaks this guarantee)

> - revlogs written before dirstate - no issues
> - revlogs written before phases - some commits may show up with the
> wrong phase (potentially public)

Yep, this pretty bad, having changeset being published "by chance" is 
pretty bad as this is not really recoverable. Especially there is people 
using non-publishing repository in production with strict permission for 
publishing (including, to some extend the mercurial project). We cannot 
regress this.

> - revbranch cache written before bookmarks/dirstate/phases - no issues
> - fncache written before bookmarks/dirstate/phases - no issues
>
> Race conditions that existed before (within the two categories):
> - fncache before revlog - fncache could point to file that doesn't exist
> - revlog before fncache - fncache could be missing a file
>
>
> So in summary, my change removes the invalid bookmarks and invalid
> dirstate case, and replaces it with an out-of-date phase case.  The out
> of date phase case could potentially be bad, since if we send public
> commits to the client, it may be hard to make them draft later. On the
> flip side, most central-server setups are publishing servers, so this
> wouldn't actually be an issue for them. This could be fixed by moving
> the phase writes to be before the revlogs

Can you poke at what it would look like to have phases written before 
and bookmark written after as mitigation?

> (or by redoing the entire
> storage system to be more widely transactional).

Patch

diff --git a/mercurial/transaction.py b/mercurial/transaction.py
--- a/mercurial/transaction.py
+++ b/mercurial/transaction.py
@@ -407,10 +407,10 @@  class transaction(object):
         '''commit the transaction'''
         if self.count == 1:
             self.validator(self)  # will raise exception if needed
-            self._generatefiles()
             categories = sorted(self._finalizecallback)
             for cat in categories:
                 self._finalizecallback[cat](self)
+            self._generatefiles()
 
         self.count -= 1
         if self.count != 0:
diff --git a/tests/test-bookmarks.t b/tests/test-bookmarks.t
--- a/tests/test-bookmarks.t
+++ b/tests/test-bookmarks.t
@@ -846,3 +846,52 @@  updates the working directory and curren
   6:81dcce76aa0b
   $ hg -R ../cloned-bookmarks-update bookmarks | grep ' Y '
    * Y                         6:81dcce76aa0b
+
+  $ cd ..
+
+ensure changelog is written before bookmarks
+  $ hg init orderrepo
+  $ cd orderrepo
+  $ touch a
+  $ hg commit -Aqm one
+  $ hg book mybook
+  $ echo a > a
+
+  $ cat > $TESTTMP/pausefinalize.py <<EOF
+  > from mercurial import extensions, localrepo
+  > import os, time
+  > def transaction(orig, self, desc, report=None):
+  >    tr = orig(self, desc, report)
+  >    def sleep(*args, **kwargs):
+  >        retry = 20
+  >        while retry > 0 and not os.path.exists("$TESTTMP/unpause"):
+  >            retry -= 1
+  >            time.sleep(0.5)
+  >        if os.path.exists("$TESTTMP/unpause"):
+  >            os.remove("$TESTTMP/unpause")
+  >    # It is important that this finalizer start with 'a', so it runs before
+  >    # the changelog finalizer appends to the changelog.
+  >    tr.addfinalize('a-sleep', sleep)
+  >    return tr
+  > 
+  > def extsetup(ui):
+  >    # This extension inserts an artifical pause during the transaction
+  >    # finalizer, so we can run commands mid-transaction-close.
+  >    extensions.wrapfunction(localrepo.localrepository, 'transaction',
+  >                            transaction)
+  > EOF
+  $ hg commit -qm two --config extensions.pausefinalize=$TESTTMP/pausefinalize.py &
+  $ sleep 2
+  $ hg log -r .
+  changeset:   0:867bc5792c8c
+  bookmark:    mybook
+  tag:         tip
+  user:        test
+  date:        Thu Jan 01 00:00:00 1970 +0000
+  summary:     one
+  
+  $ hg bookmarks
+   * mybook                    0:867bc5792c8c
+  $ touch $TESTTMP/unpause
+
+  $ cd ..