Patchwork [3,of,4] streamclone: clear caches after writing changes into files for visibility

login
register
mail settings
Submitter Katsunori FUJIWARA
Date Sept. 2, 2016, 6:34 p.m.
Message ID <33e29b5ebe885cdeeef4.1472841252@feefifofum>
Download mbox | patch
Permalink /patch/16537/
State Changes Requested
Delegated to: Yuya Nishihara
Headers show

Comments

Katsunori FUJIWARA - Sept. 2, 2016, 6:34 p.m.
# HG changeset patch
# User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
# Date 1472840249 -32400
#      Sat Sep 03 03:17:29 2016 +0900
# Node ID 33e29b5ebe885cdeeef433861e2d165b882d48d1
# Parent  32f53aaf63f8db0c1352a3b0afb290c425486704
streamclone: clear caches after writing changes into files for visibility

Before this patch, streamclone-ed changes are invisible via @filecache
properties to in-process procedures before closing transaction
(e.g. pretxnclose python hook), if corresponded property is cached
before consumev1(). Strictly speaking, caching should occur inside
(store) lock for transaction.

repo.invalidate() after closing transaction is too late to force
@filecache properties to be reloaded from changed files at next
access.

For visibility of streamclone-ed changes to in-process procedures
before closing transaction, this patch clears caches just after
writing changes into files.

Simply moving repo.invalidate() invocation into transaction scope
causes unexpected result, because it clears in-memory changes of
fncache before writing them out at closing transaction. "unexpected
result" requires rebuilding fncache by "hg debugrebuildfncache".

Therefore, this patch uses combination of invalidatefilecaches() and
invalidatecaches() instead of invalidate(). The latter consists of the
formers and store.invalidatecaches(), which clears in-memory fncache
changes.

BTW, regardless of changes in this patch, clearing cached properties
in consumev1() causes inconsistency, if (1) transaction is started and
(2) any @filecache property is changed before consumev1().

This (potential) inconsistency should be fixed in the future, and
this is reason why this patch adds comment about it as memorandum.
Yuya Nishihara - Sept. 6, 2016, 2:13 p.m.
On Sat, 03 Sep 2016 03:34:12 +0900, FUJIWARA Katsunori wrote:
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> # Date 1472840249 -32400
> #      Sat Sep 03 03:17:29 2016 +0900
> # Node ID 33e29b5ebe885cdeeef433861e2d165b882d48d1
> # Parent  32f53aaf63f8db0c1352a3b0afb290c425486704
> streamclone: clear caches after writing changes into files for visibility
> 
> Before this patch, streamclone-ed changes are invisible via @filecache
> properties to in-process procedures before closing transaction
> (e.g. pretxnclose python hook), if corresponded property is cached
> before consumev1(). Strictly speaking, caching should occur inside
> (store) lock for transaction.
> 
> repo.invalidate() after closing transaction is too late to force
> @filecache properties to be reloaded from changed files at next
> access.
> 
> For visibility of streamclone-ed changes to in-process procedures
> before closing transaction, this patch clears caches just after
> writing changes into files.
> 
> Simply moving repo.invalidate() invocation into transaction scope
> causes unexpected result, because it clears in-memory changes of
> fncache before writing them out at closing transaction. "unexpected
> result" requires rebuilding fncache by "hg debugrebuildfncache".

Does it mean store.invalidatecaches() should always be omitted in transaction?
If that's true, I'd rather make repo.invalidate() skip it conditionally
and add a comment why.

> -        # Writing straight to files circumvented the inmemory caches
> -        repo.invalidate(clearfilecache=True)
> +            # force @filecache properties to be reloaded from
> +            # streamclone-ed file at next access
> +            repo.invalidatefilecaches(clearcache=True)
> +
> +            # clear already cached @propertycache properties or so
> +            repo.invalidatecaches()

It seems obscure that we should never call store.invalidatecaches() here.
I think that is the matter of the localrepo.
Pierre-Yves David - Sept. 6, 2016, 3:34 p.m.
On 09/06/2016 04:13 PM, Yuya Nishihara wrote:
> On Sat, 03 Sep 2016 03:34:12 +0900, FUJIWARA Katsunori wrote:
>> # HG changeset patch
>> # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
>> # Date 1472840249 -32400
>> #      Sat Sep 03 03:17:29 2016 +0900
>> # Node ID 33e29b5ebe885cdeeef433861e2d165b882d48d1
>> # Parent  32f53aaf63f8db0c1352a3b0afb290c425486704
>> streamclone: clear caches after writing changes into files for visibility
>>
>> Before this patch, streamclone-ed changes are invisible via @filecache
>> properties to in-process procedures before closing transaction
>> (e.g. pretxnclose python hook), if corresponded property is cached
>> before consumev1(). Strictly speaking, caching should occur inside
>> (store) lock for transaction.
>>
>> repo.invalidate() after closing transaction is too late to force
>> @filecache properties to be reloaded from changed files at next
>> access.
>>
>> For visibility of streamclone-ed changes to in-process procedures
>> before closing transaction, this patch clears caches just after
>> writing changes into files.
>>
>> Simply moving repo.invalidate() invocation into transaction scope
>> causes unexpected result, because it clears in-memory changes of
>> fncache before writing them out at closing transaction. "unexpected
>> result" requires rebuilding fncache by "hg debugrebuildfncache".
>
> Does it mean store.invalidatecaches() should always be omitted in transaction?
> If that's true, I'd rather make repo.invalidate() skip it conditionally
> and add a comment why.

Before making too much change to when the various cahce invalidation 
function are called, I think it would be quite important to precisely 
document the semantic of each of them. And update that definition if it 
change. That way, extensions could reliably pick the right function to 
piggy back to and would not be in constant danger to have a subtle 
change/misunderstanding pushing them into misbehaving.
Katsunori FUJIWARA - Sept. 7, 2016, 8:38 p.m.
At Tue, 6 Sep 2016 23:13:37 +0900,
Yuya Nishihara wrote:
> 
> On Sat, 03 Sep 2016 03:34:12 +0900, FUJIWARA Katsunori wrote:
> > # HG changeset patch
> > # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> > # Date 1472840249 -32400
> > #      Sat Sep 03 03:17:29 2016 +0900
> > # Node ID 33e29b5ebe885cdeeef433861e2d165b882d48d1
> > # Parent  32f53aaf63f8db0c1352a3b0afb290c425486704
> > streamclone: clear caches after writing changes into files for visibility
> > 
> > Before this patch, streamclone-ed changes are invisible via @filecache
> > properties to in-process procedures before closing transaction
> > (e.g. pretxnclose python hook), if corresponded property is cached
> > before consumev1(). Strictly speaking, caching should occur inside
> > (store) lock for transaction.
> > 
> > repo.invalidate() after closing transaction is too late to force
> > @filecache properties to be reloaded from changed files at next
> > access.
> > 
> > For visibility of streamclone-ed changes to in-process procedures
> > before closing transaction, this patch clears caches just after
> > writing changes into files.
> > 
> > Simply moving repo.invalidate() invocation into transaction scope
> > causes unexpected result, because it clears in-memory changes of
> > fncache before writing them out at closing transaction. "unexpected
> > result" requires rebuilding fncache by "hg debugrebuildfncache".
> 
> Does it mean store.invalidatecaches() should always be omitted in transaction?
> If that's true, I'd rather make repo.invalidate() skip it conditionally
> and add a comment why.

You are right. I'll send revised series.


> > -        # Writing straight to files circumvented the inmemory caches
> > -        repo.invalidate(clearfilecache=True)
> > +            # force @filecache properties to be reloaded from
> > +            # streamclone-ed file at next access
> > +            repo.invalidatefilecaches(clearcache=True)
> > +
> > +            # clear already cached @propertycache properties or so
> > +            repo.invalidatecaches()
> 
> It seems obscure that we should never call store.invalidatecaches() here.
> I think that is the matter of the localrepo.
> 

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp

Patch

diff --git a/mercurial/streamclone.py b/mercurial/streamclone.py
--- a/mercurial/streamclone.py
+++ b/mercurial/streamclone.py
@@ -299,6 +299,20 @@  def consumev1(repo, fp, filecount, bytec
         repo.ui.progress(_('clone'), 0, total=bytecount, unit=_('bytes'))
         start = time.time()
 
+        # TODO: get rid of (potential) inconsistency
+        #
+        # If transaction is started and any @filecache property is
+        # changed at this point, it causes inconsistency between
+        # in-memory cached property and streamclone-ed file on the
+        # disk. Nested transaction prevents transaction scope "clone"
+        # below from writing in-memory changes out at the end of it,
+        # even though in-memory changes are discarded at the end of it
+        # regardless of transaction nesting.
+        #
+        # But transaction nesting can't be simply prohibited, because
+        # nesting occurs also in ordinary case (e.g. enabling
+        # clonebundles).
+
         with repo.transaction('clone'):
             with repo.svfs.backgroundclosing(repo.ui, expectedcount=filecount):
                 for i in xrange(filecount):
@@ -322,8 +336,12 @@  def consumev1(repo, fp, filecount, bytec
                                              total=bytecount, unit=_('bytes'))
                             ofp.write(chunk)
 
-        # Writing straight to files circumvented the inmemory caches
-        repo.invalidate(clearfilecache=True)
+            # force @filecache properties to be reloaded from
+            # streamclone-ed file at next access
+            repo.invalidatefilecaches(clearcache=True)
+
+            # clear already cached @propertycache properties or so
+            repo.invalidatecaches()
 
         elapsed = time.time() - start
         if elapsed <= 0:
diff --git a/tests/test-bundle.t b/tests/test-bundle.t
--- a/tests/test-bundle.t
+++ b/tests/test-bundle.t
@@ -310,9 +310,51 @@  Unpacking packed1 bundles with "hg unbun
 
 packed1 can be consumed from debug command
 
+(this also confirms that streamclone-ed changes are visible via
+@filecache properties to in-process procedures before closing
+transaction)
+
+  $ cat > $TESTTMP/showtip.py <<EOF
+  > from __future__ import absolute_import
+  > 
+  > def showtip(ui, repo, hooktype, **kwargs):
+  >     ui.warn('%s: %s\n' % (hooktype, repo['tip'].hex()[:12]))
+  > 
+  > def reposetup(ui, repo):
+  >     # this confirms (and ensures) that (empty) 00changelog.i
+  >     # before streamclone is already cached as repo.changelog
+  >     ui.setconfig('hooks', 'pretxnopen.showtip', showtip)
+  > 
+  >     # this confirms that streamclone-ed changes are visible to
+  >     # in-process procedures before closing transaction
+  >     ui.setconfig('hooks', 'pretxnclose.showtip', showtip)
+  > 
+  >     # this confirms that streamclone-ed changes are still visible
+  >     # after closing transaction
+  >     ui.setconfig('hooks', 'txnclose.showtip', showtip)
+  > EOF
+  $ cat >> $HGRCPATH <<EOF
+  > [extensions]
+  > showtip = $TESTTMP/showtip.py
+  > EOF
+
   $ hg -R packed debugapplystreamclonebundle packed.hg
   6 files to transfer, 2.60 KB of data
+  pretxnopen: 000000000000
+  pretxnclose: aa35859c02ea
   transferred 2.60 KB in *.* seconds (* */sec) (glob)
+  txnclose: aa35859c02ea
+
+(for safety, confirm visibility of streamclone-ed changes by another
+process, too)
+
+  $ hg -R packed tip -T "{node|short}\n"
+  aa35859c02ea
+
+  $ cat >> $HGRCPATH <<EOF
+  > [extensions]
+  > showtip = !
+  > EOF
 
 Does not work on non-empty repo