Patchwork [4,of,5] manifest: use property instead of field for manifest revlog storage

login
register
mail settings
Submitter Durham Goode
Date Aug. 17, 2016, 8:35 p.m.
Message ID <d0ad5567-0a34-a76c-d3f8-d3622be2348f@fb.com>
Download mbox | patch
Permalink /patch/16338/
State Not Applicable
Headers show

Comments

Durham Goode - Aug. 17, 2016, 8:35 p.m.
On 8/10/16 1:01 AM, FUJIWARA Katsunori wrote:
> At Mon, 8 Aug 2016 18:17:13 -0700,
> Durham Goode wrote:
>> # HG changeset patch
>> # User Durham Goode <durham@fb.com>
>> # Date 1470697899 25200
>> #      Mon Aug 08 16:11:39 2016 -0700
>> # Node ID 6a4c09571793d56c8dad1a6760e3fc1293b9a0b6
>> # Parent  f73abdf84e8e2eb9e2029cb28e2246a55b2d2f49
>> manifest: use property instead of field for manifest revlog storage
>>
>> The file caches we're using to avoid reloading the manifest from disk everytime
>> has an annoying bug that causes the in memory structure to not be reloaded if
>> the mtime and the size haven't changed. This causes a breakage in the tests
>> because the manifestlog is not being reloaded after a commit+strip operation in
>> mq (the mtime is the same because it all happens in the same second, and the
>> resulting size is the same because we add 1 and remove 1). The only reason this
>> doesn't affect the manifest itself is because we touch it so often that we
>> had already reloaded it after the commit, but before the strip.
> All of main tasks of ExactCacheValidationPlan were included into main
> repository before release of 3.9.
>
>      https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mercurial-2Dscm.org_wiki_ExactCacheValidationPlan&d=CwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=nuarHzhP1wi1T9iURRCj1A&m=HQ7AyzrzLXk4BBn1y_0Ie6XNfoDjDZ9vlLuu7fzE-Nw&s=NEEztqsTZV1sABBrBz8cXUPhWSd1PHC-bmXCI4JPFM8&e=
>
> Therefore, change of file should be detected at validation of file
> cache, even if writing changed file out (without extra handling in
> this plan) causes same ctime/mtime/size/i-node and so on.
 From looking at filecachesubentry.changed(), it looks like it just does 
'self.cachestat != newstat'.  Where is the code that's supposed to 
handle this case?

Perhaps it was some issue with filecachesubentry.refresh().  For 
instance, it seems like all the filecache entries are 'refreshed' when a 
transaction closes.  If the series of events was:

1. read repo.manifest  (it's now in the file cache)
2. read repo.manifestlog  (it's now in the file cache)
3. lock, open transaction
4. add commit   (mq stuff; in memory manifest is refreshed)
5. strip different commit (mq stuff)
6. close transaction + lock  (refresh all file caches - i.e. blindly 
updates their stat data)
7. read manifestlog <- this doesn't read anything from disk because the 
close transaction told it that all filecaches are in good shape, but 
this was never read inside the transaction, and therefore it is not in 
good shape.

It's been several weeks since I investigated this bug, so my memory is 
fuzzy and this might not be the correct analysis.
> Would you tell me the case to reproduce the issue around caching
> manifest, if it still occurs on recent Mercurial ? It helps to find
> out code paths overlooked by ExactCacheValidationPlan.
>
$ hg pull -r 7f14ccb https://bitbucket.org/DurhamG/hg
$ hg up 7f14ccb

Apply this change:

$ ./run-tests.py test-mq-merge.t
Katsunori FUJIWARA - Aug. 18, 2016, 11:53 a.m.
At Wed, 17 Aug 2016 13:35:41 -0700,
Durham Goode wrote:
> 
> On 8/10/16 1:01 AM, FUJIWARA Katsunori wrote:
> > At Mon, 8 Aug 2016 18:17:13 -0700,
> > Durham Goode wrote:
> >> # HG changeset patch
> >> # User Durham Goode <durham@fb.com>
> >> # Date 1470697899 25200
> >> #      Mon Aug 08 16:11:39 2016 -0700
> >> # Node ID 6a4c09571793d56c8dad1a6760e3fc1293b9a0b6
> >> # Parent  f73abdf84e8e2eb9e2029cb28e2246a55b2d2f49
> >> manifest: use property instead of field for manifest revlog storage
> >>
> >> The file caches we're using to avoid reloading the manifest from disk everytime
> >> has an annoying bug that causes the in memory structure to not be reloaded if
> >> the mtime and the size haven't changed. This causes a breakage in the tests
> >> because the manifestlog is not being reloaded after a commit+strip operation in
> >> mq (the mtime is the same because it all happens in the same second, and the
> >> resulting size is the same because we add 1 and remove 1). The only reason this
> >> doesn't affect the manifest itself is because we touch it so often that we
> >> had already reloaded it after the commit, but before the strip.
> > All of main tasks of ExactCacheValidationPlan were included into main
> > repository before release of 3.9.
> >
> >      https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mercurial-2Dscm.org_wiki_ExactCacheValidationPlan&d=CwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=nuarHzhP1wi1T9iURRCj1A&m=HQ7AyzrzLXk4BBn1y_0Ie6XNfoDjDZ9vlLuu7fzE-Nw&s=NEEztqsTZV1sABBrBz8cXUPhWSd1PHC-bmXCI4JPFM8&e=
> >
> > Therefore, change of file should be detected at validation of file
> > cache, even if writing changed file out (without extra handling in
> > this plan) causes same ctime/mtime/size/i-node and so on.
>  From looking at filecachesubentry.changed(), it looks like it just does 
> 'self.cachestat != newstat'.  Where is the code that's supposed to 
> handle this case?

If file is:

  - created via util.atomictempfile with checkambig=True
  - renamed via vfs.rename() with checkambig=True
  - copied via util.copyfile() with checkambig=True

mtime of newly created file is +1, and this ensures that
"oldstat.mtime != newstat.mtime".

Therefore, just checking "self.cachestat != newstat" can detect change
of file (if ExactCacheValidationPlan works as expected :-))


> Perhaps it was some issue with filecachesubentry.refresh().  For 
> instance, it seems like all the filecache entries are 'refreshed' when a 
> transaction closes.  If the series of events was:
> 
> 1. read repo.manifest  (it's now in the file cache)
> 2. read repo.manifestlog  (it's now in the file cache)
> 3. lock, open transaction
> 4. add commit   (mq stuff; in memory manifest is refreshed)
> 5. strip different commit (mq stuff)
> 6. close transaction + lock  (refresh all file caches - i.e. blindly 
> updates their stat data)
> 7. read manifestlog <- this doesn't read anything from disk because the 
> close transaction told it that all filecaches are in good shape, but 
> this was never read inside the transaction, and therefore it is not in 
> good shape.
> 
> It's been several weeks since I investigated this bug, so my memory is 
> fuzzy and this might not be the correct analysis.
>
> > Would you tell me the case to reproduce the issue around caching
> > manifest, if it still occurs on recent Mercurial ? It helps to find
> > out code paths overlooked by ExactCacheValidationPlan.
> >
> $ hg pull -r 7f14ccb https://bitbucket.org/DurhamG/hg
> $ hg up 7f14ccb
> 
> Apply this change:
> diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
> --- a/mercurial/localrepo.py
> +++ b/mercurial/localrepo.py
> @@ -504,7 +504,7 @@ class localrepository(object):
>       def manifest(self):
>           return manifest.manifest(self.svfs)
> 
> -    @property
> +    @storecache('00manifest.i')
>       def manifestlog(self):
>           return manifest.manifestlog(self.svfs, self)
> 
> 
> $ ./run-tests.py test-mq-merge.t

Thank you for information! I can reproduce the issue (BTW, you would
mention "integrity check failed on 00manifest.i" at "hg qpush -m",
wouldn't you?)

I confirmed that this issue can be avoided by some extra changes for
ExactCacheValidationPlan. I'll send patch series tomorrow or so.


----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Katsunori FUJIWARA - Aug. 21, 2016, 1:12 p.m.
At Thu, 18 Aug 2016 20:53:02 +0900,
FUJIWARA Katsunori wrote:
> 
> 
> At Wed, 17 Aug 2016 13:35:41 -0700,
> Durham Goode wrote:
> > 
> > On 8/10/16 1:01 AM, FUJIWARA Katsunori wrote:
> > > At Mon, 8 Aug 2016 18:17:13 -0700,
> > > Durham Goode wrote:
> > >> # HG changeset patch
> > >> # User Durham Goode <durham@fb.com>
> > >> # Date 1470697899 25200
> > >> #      Mon Aug 08 16:11:39 2016 -0700
> > >> # Node ID 6a4c09571793d56c8dad1a6760e3fc1293b9a0b6
> > >> # Parent  f73abdf84e8e2eb9e2029cb28e2246a55b2d2f49
> > >> manifest: use property instead of field for manifest revlog storage
> > >>
> > >> The file caches we're using to avoid reloading the manifest from disk everytime
> > >> has an annoying bug that causes the in memory structure to not be reloaded if
> > >> the mtime and the size haven't changed. This causes a breakage in the tests
> > >> because the manifestlog is not being reloaded after a commit+strip operation in
> > >> mq (the mtime is the same because it all happens in the same second, and the
> > >> resulting size is the same because we add 1 and remove 1). The only reason this
> > >> doesn't affect the manifest itself is because we touch it so often that we
> > >> had already reloaded it after the commit, but before the strip.
> > > All of main tasks of ExactCacheValidationPlan were included into main
> > > repository before release of 3.9.
> > >
> > >      https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mercurial-2Dscm.org_wiki_ExactCacheValidationPlan&d=CwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=nuarHzhP1wi1T9iURRCj1A&m=HQ7AyzrzLXk4BBn1y_0Ie6XNfoDjDZ9vlLuu7fzE-Nw&s=NEEztqsTZV1sABBrBz8cXUPhWSd1PHC-bmXCI4JPFM8&e=
> > >
> > > Therefore, change of file should be detected at validation of file
> > > cache, even if writing changed file out (without extra handling in
> > > this plan) causes same ctime/mtime/size/i-node and so on.
> >  From looking at filecachesubentry.changed(), it looks like it just does 
> > 'self.cachestat != newstat'.  Where is the code that's supposed to 
> > handle this case?
> 
> If file is:
> 
>   - created via util.atomictempfile with checkambig=True
>   - renamed via vfs.rename() with checkambig=True
>   - copied via util.copyfile() with checkambig=True
> 
> mtime of newly created file is +1, and this ensures that
> "oldstat.mtime != newstat.mtime".
> 
> Therefore, just checking "self.cachestat != newstat" can detect change
> of file (if ExactCacheValidationPlan works as expected :-))
> 
> 
> > Perhaps it was some issue with filecachesubentry.refresh().  For 
> > instance, it seems like all the filecache entries are 'refreshed' when a 
> > transaction closes.  If the series of events was:
> > 
> > 1. read repo.manifest  (it's now in the file cache)
> > 2. read repo.manifestlog  (it's now in the file cache)
> > 3. lock, open transaction
> > 4. add commit   (mq stuff; in memory manifest is refreshed)
> > 5. strip different commit (mq stuff)
> > 6. close transaction + lock  (refresh all file caches - i.e. blindly 
> > updates their stat data)
> > 7. read manifestlog <- this doesn't read anything from disk because the 
> > close transaction told it that all filecaches are in good shape, but 
> > this was never read inside the transaction, and therefore it is not in 
> > good shape.
> > 
> > It's been several weeks since I investigated this bug, so my memory is 
> > fuzzy and this might not be the correct analysis.
> >
> > > Would you tell me the case to reproduce the issue around caching
> > > manifest, if it still occurs on recent Mercurial ? It helps to find
> > > out code paths overlooked by ExactCacheValidationPlan.
> > >
> > $ hg pull -r 7f14ccb https://bitbucket.org/DurhamG/hg
> > $ hg up 7f14ccb
> > 
> > Apply this change:
> > diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
> > --- a/mercurial/localrepo.py
> > +++ b/mercurial/localrepo.py
> > @@ -504,7 +504,7 @@ class localrepository(object):
> >       def manifest(self):
> >           return manifest.manifest(self.svfs)
> > 
> > -    @property
> > +    @storecache('00manifest.i')
> >       def manifestlog(self):
> >           return manifest.manifestlog(self.svfs, self)
> > 
> > 
> > $ ./run-tests.py test-mq-merge.t
> 
> Thank you for information! I can reproduce the issue (BTW, you would
> mention "integrity check failed on 00manifest.i" at "hg qpush -m",
> wouldn't you?)

I found the root cause of "integrity check failed on 00manifest.i" at
"hg qpush -m" while additional working for ExactCacheValidationPlan.

At first, preconditions:

  A. at closing transaction, _refreshfilecachestats() is invoked to
     refresh filestat of filecache-ed properties in localrepository.

     This expects that all cached properties should be consistent with
     committed transaction result.

       https://selenic.com/repo/hg/file/3.9/mercurial/localrepo.py#l1090

  B. truncation while strip() is also causes _refreshfilecachestats(),
     because stripping logic is executed under the transaction

     Therefore, keeping filecache properties even after strip() is
     expected behavior.

  C. repo.invalidate() makes filecache properties a kind of "pending"
     status

     Cached objects are still kept in repo._filecache, but they are
     removed from repo.__dict__.

     This "pending" status is cleared, when a property is accessed
     (and filestat isn't changed on disk).

     repo.invalidate() is executed at the beginning of (s)lock scope.

  D. _refreshfilecachestats() omits updating filestat of filecache
     property, if that property is "pending"

     "k not in self.__dict__" case:

       https://selenic.com/repo/hg/file/3.9/mercurial/localrepo.py#l1271

Then, at 7f14ccbbadab in your repo:

  1. updating filestat of repo.manifest property is omitted while some
     transactions before strip() in the issue case

     because repo.commitctx() uses repo.manifestlog instead of
     repo.manifest.

  2. the object instantiated for repo.manifest is also cached by
     repo.manifestlog._revlog field

     Call the former MF1, and the latter MFL1

  3. accessing to repo.manifest at the beginning of strip() causes
     re-instantiating manifest object

     because filestat of repo.manifest property differs from actual
     filestat of 00manifest.i file on disk at this point: see (1)
     above.

     Call newly created object MF2.

  4. stripping applies strip() on MF2

     This removes some nodes added before stripping from MF2.

     But MF1 referred by MFL1 still contains stripped nodes. In
     addition to it, _mancache of MFL1 also contains already stripped
     nodes.

     At this point, MFL1 breaks precondition (A): cached property
     should be consistent with committed transaction result.

  5. subsequent committing might cause inconsistent result because of
     invalid MFL1

     for example, in test-mq-merge.t case, both #6 and #7 entries in
     MF1 have linkrev=6 (I'm not sure about detail flow of
     inconsistent result).



> I confirmed that this issue can be avoided by some extra changes for
> ExactCacheValidationPlan. I'll send patch series tomorrow or so.

FYI, mergeone() has 3 lock scopes and 4 transaction scopes detecting
integrity error as below :-)

    mergeone():
        self.apply():
            with repo.lock(): # repo.invalidate(), here
                with repo.transaction():
                    do_committing
                # repo._refreshfilecachestats(), here
        strip.strip():
            with repo.lock(): # repo.invalidate(), here
                repair.strip():
                    with repo.transaction():
                        do_stripping
                    # repo._refreshfilecachestats(), here
                    with repo.transaction():
                        do_bookmark_repairing
                    # repo._refreshfilecachestats(), here
        # (*)
        newcommit(): # => repo.commit()
            with repo.lock(): # repo.invalidate(), here
                with repo.transaction():
                    do_committing
                # repo._refreshfilecachestats(), here
        self.printdiff():
            cmdutil.diffordiffstat():
                # integrity error is detected here

"some extra changes for ExactCacheValidationPlan" makes
repo.manifestlog property detect change of 00manifest.i on disk at (*)
in above flow. repo.manifestlog is "pending" while stripping, because
repair.strip() only changes repo.manifest.


> ----------------------------------------------------------------------
> [FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Katsunori FUJIWARA - Sept. 2, 2016, 6:52 p.m.
At Wed, 17 Aug 2016 13:35:41 -0700,
Durham Goode wrote:
> 
> 
> On 8/10/16 1:01 AM, FUJIWARA Katsunori wrote:
> > At Mon, 8 Aug 2016 18:17:13 -0700,
> > Durham Goode wrote:
> >> # HG changeset patch
> >> # User Durham Goode <durham@fb.com>
> >> # Date 1470697899 25200
> >> #      Mon Aug 08 16:11:39 2016 -0700
> >> # Node ID 6a4c09571793d56c8dad1a6760e3fc1293b9a0b6
> >> # Parent  f73abdf84e8e2eb9e2029cb28e2246a55b2d2f49
> >> manifest: use property instead of field for manifest revlog storage
> >>
> >> The file caches we're using to avoid reloading the manifest from disk everytime
> >> has an annoying bug that causes the in memory structure to not be reloaded if
> >> the mtime and the size haven't changed. This causes a breakage in the tests
> >> because the manifestlog is not being reloaded after a commit+strip operation in
> >> mq (the mtime is the same because it all happens in the same second, and the
> >> resulting size is the same because we add 1 and remove 1). The only reason this
> >> doesn't affect the manifest itself is because we touch it so often that we
> >> had already reloaded it after the commit, but before the strip.

> > Would you tell me the case to reproduce the issue around caching
> > manifest, if it still occurs on recent Mercurial ? It helps to find
> > out code paths overlooked by ExactCacheValidationPlan.
> >
> $ hg pull -r 7f14ccb https://bitbucket.org/DurhamG/hg
> $ hg up 7f14ccb
> 
> Apply this change:
> diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
> --- a/mercurial/localrepo.py
> +++ b/mercurial/localrepo.py
> @@ -504,7 +504,7 @@ class localrepository(object):
>       def manifest(self):
>           return manifest.manifest(self.svfs)
> 
> -    @property
> +    @storecache('00manifest.i')
>       def manifestlog(self):
>           return manifest.manifestlog(self.svfs, self)
> 
> 
> $ ./run-tests.py test-mq-merge.t
> 

Sorry for late followup. I had investigated unexpected filecache
behavior.

Let me re-summarize the issue.


At first, "file stat ambiguity of 00manifest.i around stripping" isn't
the root cause of unexpected failure at making repo.manifestlog
@storecache, even though such ambiguity itself should be fixed.

It is a kind of accidental side effect that fixing such ambiguity
avoids failure in some cases (detail is explained later).


The root cause of failure at making repo.manifestlog @storecache is
that "manifest" object referred by repo.manifest diverges from one
referred by manifestctx cached in manifestlog.

After Durham's works (including ones not yet queued into main repo),
"manifest" object is often referred via _revlog of cached manifestctx
instead of repo.manifest. This prevents file stat for repo.manifest
property from being refreshed.

It is important that @filecache-ed repo.manifest property focuses not
on actual using cached "manifest" object, but accessing via
repo.manifest.

Therefore, un-refreshed file stat causes unexpected reloading
repo.manifest from file, even though referred "manifest" object itself
is still valid at that time.

Then, repo.manifest and manifestctx._revlog don't refer same
"manifest" object (call this "manifest divergence" in this
explanation). This "manifest divergence" causes inconsistency.

Typical steps below can be reproduced in test-revlog-ancestry.py.

  1. at the first transaction:

     assumption: only cached manifestctx is required

     1-1. repo.manifest is bypassed by cached manifestctx, because it
          already refers "manifest" object

     1-2. operations apply changes on manifestctx._revlog

     1-3. changes on manifestctx._revlog are written into 00manifest.i

     1-4. closing transaction doesn't refresh file stat of repo.manifest
          because of bypassing at (1-1)

  2. at the second transaction:

     assumption: new (= not cached) manifestctx is required

     2-1. accessing to repo.manifest reloads 00manifest.i,
          because file stat of it isn't refreshed at (1-3)

          e.g. merging with or committing on old revision causes
          accessing to repo.manifest. BTW, failure in test-mq-merge.t
          also occurs just after committing merged result.

          Here, reloaded "manifest" diverges from _revlog of already
          cached manifestctx.

     2-2. operations apply changes on _revlog of cached manifestctx

          repo.manifest isn't aware of these changes, because of its
          divergence.

     2-3. changes on manifestctx._revlog are written into 00manifest.i

     2-4. closing transaction refreshes file stat of not only
          repo.manifestlog but also repo.manifest, even though the
          latter doesn't contain

          Accessing to it at (2-1) causes this refreshing.

  3. then:

     3-1. accessing to repo.manifest reuses already cached "manifest"
          object, because file stat of it is refreshed at (2-4)

     3-2. but looking node created at (2-2) up in it causes failure,
          because it doesn't contain that node

This issue can be fixed by:

  - disable _mancache in manifestlog (e.g. make cachesize of _mancache 0)

    This easily avoids "manifest divergence" between repo.manifest and
    manifestctx._revlog (and explains that it is the root cause of
    issue).

    Of course, this isn't reasonable solution :-)

  - make manifestlog._revlog normal method

    This forces manifestctx to invoke manifestlog._revlog() or so when
    it needs "manifest" object, though.

  - and so on (introducing a proxy object to avoid direct linking
    between manifestctx and "manifest" object, etc ...)



Failure at "hg qpush -m" in test-mq-merge.t is a little complicated
variant.

  a. at the first transaction to add new revision:

     Refreshing file stat of @filecache properties doesn't work
     correctly, if transaction is started via "filtered repo" object.

     Explicit repo.transaction() invocation easily causes this
     situation.

     I just posted patch series to fix this issue. Please see below
     for detail.

       https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-September/087868.html

  b. at the second transaction to strip added revision:

     b-1. accessing to repo.manifest reloads 00manifest.i, because of (a)

     b-2. but repo.manifestlog (and cached manifestctx) is still kept,
          because repair.strip() doesn't refer it

     Here, "manifest" reloaded at (b-1) diverges from _revlog of
     already cached manifestctx.

     (b-2) also causes:

     b-3. refreshing file stat of repo.manifestlog is omitted

     b-4. "manifest" referred by cached manifestctx still contains garbage

  c. at the third transaction

     c-1. repo.manifestlog (and cached manifestctx) is unintentionally
          reused, because (a) and (b-3) causes file stat ambiguity

     c-2. using garbage in _revlog of cached manifestctx causes
          integrity issue of newly created commit

There are some solutions below to avoid such failure by side effect,
but failure in test-revlog-ancestry.py can't be fixed even if all of
them are applied.

  - refresh file stat correctly

    This avoids unexpected behavior at (a)

  - refer repo.manifestlog in repair.strip()

    This avoids unexpected behavior at (b-2).

    Just referring is enough to avoid "manifest divergence" at (b).
    But this doesn't work as expected, if "manifest divergence"
    already occurred.

  - avoid file stat ambiguity around truncation

    This avoids unexpected behavior at (c-1)


----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp

Patch

diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -504,7 +504,7 @@  class localrepository(object):
      def manifest(self):
          return manifest.manifest(self.svfs)

-    @property
+    @storecache('00manifest.i')
      def manifestlog(self):
          return manifest.manifestlog(self.svfs, self)