Patchwork [5,of,6,V2] obscache: instantiate the cache and keep it warm

login
register
mail settings
Submitter Pierre-Yves David
Date May 20, 2017, 3:30 p.m.
Message ID <3c2a082a590aa8b57693.1495294219@nodosa.octopoid.net>
Download mbox | patch
Permalink /patch/20775/
State Deferred
Headers show

Comments

Pierre-Yves David - May 20, 2017, 3:30 p.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@octobus.net>
# Date 1495198006 -7200
#      Fri May 19 14:46:46 2017 +0200
# Node ID 3c2a082a590aa8b57693c24b8461c2afdb8d5556
# Parent  eb72bf5f892c01171513f9ebfef69c8a3dbdb1ab
# EXP-Topic obscache
# Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
#              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r 3c2a082a590a
obscache: instantiate the cache and keep it warm

We are not using it yet, but we make sure we have a cache and that we keep it up
to date after each transaction.

The two "warning: ignoring unknown working parent" output in the tests are
caused by 'blackbox' accessing the dirstate during the strip. I've not found an
easy way to work around this so I kept them as they are harmless.
Jun Wu - May 23, 2017, 4:16 a.m.
Excerpts from Pierre-Yves David's message of 2017-05-20 17:30:19 +0200:
> [...]
> diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
> --- a/mercurial/localrepo.py
> +++ b/mercurial/localrepo.py
> @@ -1272,6 +1272,9 @@ class localrepository(object):
>              self.ui.debug('updating the branch cache\n')
>              branchmap.updatecache(self.filtered('served'))
>  
> +        self.obsstore.obscache.update(self)
> +        self.obsstore.obscache.save(self)

It seems the cache will not be saved for read operations, but is always
rebuilt when accessed.

That means, without a cache, hg commands like "hg id" at Facebook will take
9 seconds no matter how many times you run it (as long as no write operation
is executed). 

I think it might be better to just fallback to the old code path if the
cache does not exist.


When thinking about it more, the only performance difference with the index
approach is the overhead converting between revs and nodes. It seems
possible to control that overhead to about 20ms for 6k revisions.

If the performance difference is small (like 20ms), I wonder if it's still
worthwhile to go the obscache approach since it's not that simple and the
initial cache build time is not very pleasant for huge repos. That could be
a perf hit for automations which reclones.

> [...]
Pierre-Yves David - May 23, 2017, 9:49 a.m.
(I unified the two emails to keep the discussion simpler)

On 05/23/2017 06:16 AM, Jun Wu wrote:
> Excerpts from Pierre-Yves David's message of 2017-05-20 17:30:19 +0200:
>> [...]
>> diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
>> --- a/mercurial/localrepo.py
>> +++ b/mercurial/localrepo.py
>> @@ -1272,6 +1272,9 @@ class localrepository(object):
>>              self.ui.debug('updating the branch cache\n')
>>              branchmap.updatecache(self.filtered('served'))
>>
>> +        self.obsstore.obscache.update(self)
>> +        self.obsstore.obscache.save(self)
>
> It seems the cache will not be saved for read operations, but is always
> rebuilt when accessed.

This is correct for the state achieved after this series. I left out 
smarter logic in this area to keep it simpler (see comment in patch 6). 
There will be quick follow up to address that.

> That means, without a cache, hg commands like "hg id" at Facebook will take
> 9 seconds no matter how many times you run it (as long as no write operation
> is executed).
>
> I think it might be better to just fallback to the old code path if the
> cache does not exist.

This is exactly what the follow up series will be about. We are using 
this approach in the evolve extensions for a while. If I'm not mistaken 
you are likely shipping that code to your users for a couple of weeks.

Associated code in evolve:

 
https://www.mercurial-scm.org/repo/evolve/file/tip/hgext3rd/evolve/obscache.py#l464

Relevant comment in patch 6:

 >> +    # XXX There are a couple of case where the cache could not be 
up to date:
 >> +    #
 >> +    # 1) no transaction happened in the repository since the upgrade,
 >> +    # 2) both old and new client touches that repository
 >> +    #
 >> +    # recomputing the whole cache in these case is a bit slower 
that using the
 >> +    # good old version (parsing markers and checking them). We 
could add some
 >> +    # logic to fall back to the old way in these cases.
 >> +    obscache = repo.obsstore.obscache
 >> +    obscache.update(repo) # ensure it is up to date:
 >> +    isobs = obscache.get
 >> +
 >> +    # actually compute the obsolete set

> When thinking about it more, the only performance difference with the index
> approach is the overhead converting between revs and nodes. It seems
> possible to control that overhead to about 20ms for 6k revisions.

The node lookup in the indexes is also going to be slower than the 
bytearray indexing. So I expect some extra overhead here too.

> If the performance difference is small (like 20ms), I wonder if it's still
> worthwhile to go the obscache approach since it's not that simple and the
> initial cache build time is not very pleasant for huge repos. That could be
> a perf hit for automations which reclones.

It seems to be still worthwhile. 20ms is still about x20 slower than the 
order of magnitude we are talking about. (+ lookup cost)

(For other reader, here is the current timing record and analysis:
https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-May/098348.html)

There are also couple of easy thing we can do to avoid full cache 
invalidation.

On 05/23/2017 04:22 AM, Jun Wu wrote:
> I have got a chance to test this in production with one of our biggest repo
> and a 18MB obsstore. I tried running "hg id" with a a minimal hgrc (with
> just lz4revlog and remotefilelog enabled) and without an existing obscache.
>
> dualsourcecache.update takes 9 seconds. 7 seconds are used to scan
> changelog, 0.36 seconds used to scan obsstore.
>
> The related profiling data is as below:
>
>   9735       | _computeobsoleteset              obsolete.py:1554
>   9421        \ update                          obsolete.py:1227
>   1872          \ _upgradeneeded                obsolete.py:1311
>    434            \ revs (many times)           changelog.py:319
>   1179            \ __iter__                    obsolete.py:564
>   1179             | __get__                    util.py:798
>   1177             | _all                       obsolete.py:707
>   1138             | _readmarkers               obsolete.py:444
>   1138             | _fm1readmarkers            obsolete.py:432
>   7549          \ _updatefrom                   obsolete.py:1433
> * 7185            \ _updaterevs                 obsolete.py:1439
>    766              \ __get__                   util.py:798
>    766               | successors               obsolete.py:717
>    766               | _addsuccessors           obsolete.py:506
> * 3964              \ node (many times)         changelog.py:359
>    364            \ _updatemarkers              obsolete.py:1471
>
> So it is expensive for us to rebuild the cache. It seems the cache may be
> dropped in some cases (ex. histedit --abort). I wonder if we can make the
> cache more persistent so it'll be impossible to be nuked entirely, like when
> doing a strip, just truncate the cache file to the strip point instead of
> nuking it.

The simplest solution I can see here is to backup the cache when 
histedit/rebase starts and reinstall the cache when the operation is 
aborted. That way we can incrementally upgrade from that backup point. 
This approach can be generalized to other caches suffering from strip.

Stripping the cache is also an option, but only in the case where only 
the changelog has been stripped (not the obsstore).

(note: there are likely simple optimization lurking in update function 
too. For example if len(obsstore) << len(repo), we could focus on 
iterating on the obsstore only when rebuilding the cache, etc…).



Does these details address your concerns?

Cheers.
Jun Wu - May 23, 2017, 3:07 p.m.
Excerpts from Pierre-Yves David's message of 2017-05-23 11:49:49 +0200:
> [...]
> Does these details address your concerns?

There are some automation which does reclones. Reclone could be done in
seconds. 9 seconds obscache build time does look a bit painful for them.
So I'd like to see a config option to opt out the cache.

> Cheers.
Pierre-Yves David - May 23, 2017, 6:20 p.m.
On 05/23/2017 05:07 PM, Jun Wu wrote:
> Excerpts from Pierre-Yves David's message of 2017-05-23 11:49:49 +0200:
>> [...]
>> Does these details address your concerns?
>
> There are some automation which does reclones. Reclone could be done in
> seconds. 9 seconds obscache build time does look a bit painful for them.

Out of curiosity, what order of magnitude of "seconds" are we talking 
about here? What kind of extra setup do you use (beside remotefilelog) 
to achieves that kind of timing for such large repository ?

> So I'd like to see a config option to opt out the cache.

I can certainly adds that.

Cheers
Jun Wu - May 23, 2017, 11:03 p.m.
Excerpts from Pierre-Yves David's message of 2017-05-23 20:20:49 +0200:
> 
> On 05/23/2017 05:07 PM, Jun Wu wrote:
> > Excerpts from Pierre-Yves David's message of 2017-05-23 11:49:49 +0200:
> >> [...]
> >> Does these details address your concerns?
> >
> > There are some automation which does reclones. Reclone could be done in
> > seconds. 9 seconds obscache build time does look a bit painful for them.
> 
> Out of curiosity, what order of magnitude of "seconds" are we talking 
> about here? What kind of extra setup do you use (beside remotefilelog) 
> to achieves that kind of timing for such large repository ?

The initial stream clone takes 13 seconds to download full changelog and
manifest. Update is slow but a user-space filesystem will remove that cost.
In the near future, we may migrate to some kind of lazy manifest so stream
clone can skip manifest revlog and the time could be cut in half.

> > So I'd like to see a config option to opt out the cache.
> 
> I can certainly adds that.
> 
> Cheers
>
Pierre-Yves David - May 24, 2017, 12:03 p.m.
On 05/24/2017 01:03 AM, Jun Wu wrote:
> Excerpts from Pierre-Yves David's message of 2017-05-23 20:20:49 +0200:
>>
>> On 05/23/2017 05:07 PM, Jun Wu wrote:
>>> Excerpts from Pierre-Yves David's message of 2017-05-23 11:49:49 +0200:
>>>> [...]
>>>> Does these details address your concerns?
>>>
>>> There are some automation which does reclones. Reclone could be done in
>>> seconds. 9 seconds obscache build time does look a bit painful for them.
>>
>> Out of curiosity, what order of magnitude of "seconds" are we talking
>> about here? What kind of extra setup do you use (beside remotefilelog)
>> to achieves that kind of timing for such large repository ?
>
> The initial stream clone takes 13 seconds to download full changelog and
> manifest. Update is slow but a user-space filesystem will remove that cost.
> In the near future, we may migrate to some kind of lazy manifest so stream
> clone can skip manifest revlog and the time could be cut in half.

I seee, So if the obscache was included into the streaming clone, that 
would remove the impact to clone.

Cheers,
Jun Wu - May 24, 2017, 5:39 p.m.
Excerpts from Pierre-Yves David's message of 2017-05-24 14:03:22 +0200:
> 
> On 05/24/2017 01:03 AM, Jun Wu wrote:
> > Excerpts from Pierre-Yves David's message of 2017-05-23 20:20:49 +0200:
> >>
> >> On 05/23/2017 05:07 PM, Jun Wu wrote:
> >>> Excerpts from Pierre-Yves David's message of 2017-05-23 11:49:49 +0200:
> >>>> [...]
> >>>> Does these details address your concerns?
> >>>
> >>> There are some automation which does reclones. Reclone could be done in
> >>> seconds. 9 seconds obscache build time does look a bit painful for them.
> >>
> >> Out of curiosity, what order of magnitude of "seconds" are we talking
> >> about here? What kind of extra setup do you use (beside remotefilelog)
> >> to achieves that kind of timing for such large repository ?
> >
> > The initial stream clone takes 13 seconds to download full changelog and
> > manifest. Update is slow but a user-space filesystem will remove that cost.
> > In the near future, we may migrate to some kind of lazy manifest so stream
> > clone can skip manifest revlog and the time could be cut in half.
> 
> I seee, So if the obscache was included into the streaming clone, that 
> would remove the impact to clone.

One of our goals is to avoid using revlog for scalability. Adding new
dependence to revlog means we need more monkey patching to get rid of them
later. So I think a config option could be a better choice if we do think
obscache and its 20ms speedup is worthwhile.

> 
> Cheers,
>
Augie Fackler - May 24, 2017, 5:41 p.m.
> On May 24, 2017, at 13:39, Jun Wu <quark@fb.com> wrote:
> 
>> I seee, So if the obscache was included into the streaming clone, that 
>> would remove the impact to clone.
> 
> One of our goals is to avoid using revlog for scalability. Adding new
> dependence to revlog means we need more monkey patching to get rid of them
> later. So I think a config option could be a better choice if we do think
> obscache and its 20ms speedup is worthwhile.

I'm extremely sympathetic to wanting to avoid new deps on revlog, but in this case we're talking about a 500ms win on 'hg id' in the Mercurial repository, which is significant savings.

Realistically, any non-revlog storage engine is going to have to work out similar cache invalidation hooks anyway, isn't it?
Jun Wu - May 24, 2017, 5:48 p.m.
Excerpts from Augie Fackler's message of 2017-05-24 13:41:53 -0400:
> I'm extremely sympathetic to wanting to avoid new deps on revlog, but in
> this case we're talking about a 500ms win on 'hg id' in the Mercurial
> repository, which is significant savings.

I have already demonstrated the index works. They are tagged as "RFC" but
actually quite complete and I can polish them up in 2 days. And my latest
analysis indicates it could be made to be just 20ms difference from
obscache.

I'd note that obscache is also incomplete in this series - the way it
handles strip (could be triggered by histedit/rebase --abort) is bad for
huge repos (the next command takes 7 seconds).

When talking about performance. Please take that 7 seconds after strip (or
histedit --abort) into consideration and not only just read operations.

> Realistically, any non-revlog storage engine is going to have to work out
> similar cache invalidation hooks anyway, isn't it?

True. But the index stuff does not depend on changelog. Its single source of
truth is obsstore. So it's easier.

Patch

diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -1272,6 +1272,9 @@  class localrepository(object):
             self.ui.debug('updating the branch cache\n')
             branchmap.updatecache(self.filtered('served'))
 
+        self.obsstore.obscache.update(self)
+        self.obsstore.obscache.save(self)
+
     def invalidatecaches(self):
 
         if '_tagscache' in vars(self):
diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
--- a/mercurial/obsolete.py
+++ b/mercurial/obsolete.py
@@ -557,6 +557,7 @@  class obsstore(object):
         self.svfs = repo.svfs
         self._version = defaultformat
         self._readonly = readonly
+        self.obscache = obscache(repo)
 
     def __iter__(self):
         return iter(self._all)
diff --git a/tests/test-blackbox.t b/tests/test-blackbox.t
--- a/tests/test-blackbox.t
+++ b/tests/test-blackbox.t
@@ -59,10 +59,10 @@  clone, commit, pull
   added 1 changesets with 1 changes to 1 files
   (run 'hg update' to get a working copy)
   $ hg blackbox -l 6
-  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> pull
   1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> updated served branch cache in * seconds (glob)
   1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> wrote served branch cache with 1 labels and 2 nodes
-  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> 1 incoming changes - new heads: d02f48003e62
+  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> updated obscache in * seconds (3r, 0o) (glob)
+  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> 1 incoming changes - new heads: d02f48003e62 (glob)
   1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> pull exited 0 after * seconds (glob)
   1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> blackbox -l 6
 
@@ -121,8 +121,8 @@  backup bundles get logged
   0 files updated, 0 files merged, 1 files removed, 0 files unresolved
   saved backup bundle to $TESTTMP/blackboxtest2/.hg/strip-backup/*-backup.hg (glob)
   $ hg blackbox -l 6
-  1970/01/01 00:00:00 bob @73f6ee326b27d820b0472f1a825e3a50f3dc489b (5000)> strip tip
-  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> saved backup bundle to $TESTTMP/blackboxtest2/.hg/strip-backup/73f6ee326b27-7612e004-backup.hg (glob)
+  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> strip detected, obscache cache reset
+  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> updated obscache in * seconds (3r, 0o) (glob)
   1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> updated base branch cache in * seconds (glob)
   1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> wrote base branch cache with 1 labels and 2 nodes
   1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> strip tip exited 0 after * seconds (glob)
@@ -194,9 +194,9 @@  log rotation
   
   result: None
   $ hg blackbox
-  1970/01/01 00:00:00 bob @0e46349438790c460c5c9f7546bfcd39b267bbd2 (5000)> commit -m commit2 -d 2000-01-02 foo
   1970/01/01 00:00:00 bob @45589e459b2edfbf3dbde7e01f611d2c1e7453d7 (5000)> updated served branch cache in * seconds (glob)
   1970/01/01 00:00:00 bob @45589e459b2edfbf3dbde7e01f611d2c1e7453d7 (5000)> wrote served branch cache with 1 labels and 1 nodes
+  1970/01/01 00:00:00 bob @45589e459b2edfbf3dbde7e01f611d2c1e7453d7 (5000)> updated obscache in * seconds (1r, 0o) (glob)
   1970/01/01 00:00:00 bob @45589e459b2edfbf3dbde7e01f611d2c1e7453d7 (5000)> commit -m commit2 -d 2000-01-02 foo exited 0 after * seconds (glob)
   1970/01/01 00:00:00 bob @45589e459b2edfbf3dbde7e01f611d2c1e7453d7 (5000)> log -r 0
   1970/01/01 00:00:00 bob @45589e459b2edfbf3dbde7e01f611d2c1e7453d7 (5000)> writing .hg/cache/tags2-visible with 0 tags
diff --git a/tests/test-clone.t b/tests/test-clone.t
--- a/tests/test-clone.t
+++ b/tests/test-clone.t
@@ -35,6 +35,7 @@  Trigger branchcache creation:
   checklink (symlink !)
   checklink-target (symlink !)
   checknoexec (execbit !)
+  obscache-v01
   rbc-names-v1
   rbc-revs-v1
 
diff --git a/tests/test-debugcommands.t b/tests/test-debugcommands.t
--- a/tests/test-debugcommands.t
+++ b/tests/test-debugcommands.t
@@ -118,6 +118,7 @@  Test cache warming command
   $ ls -r .hg/cache/*
   .hg/cache/rbc-revs-v1
   .hg/cache/rbc-names-v1
+  .hg/cache/obscache-v01
   .hg/cache/branch2-served
 
   $ cd ..
diff --git a/tests/test-fncache.t b/tests/test-fncache.t
--- a/tests/test-fncache.t
+++ b/tests/test-fncache.t
@@ -86,6 +86,7 @@  Non store repo:
   .hg/00manifest.i
   .hg/cache
   .hg/cache/branch2-served
+  .hg/cache/obscache-v01
   .hg/cache/rbc-names-v1
   .hg/cache/rbc-revs-v1
   .hg/data
@@ -118,6 +119,7 @@  Non fncache repo:
   .hg/00changelog.i
   .hg/cache
   .hg/cache/branch2-served
+  .hg/cache/obscache-v01
   .hg/cache/rbc-names-v1
   .hg/cache/rbc-revs-v1
   .hg/dirstate
diff --git a/tests/test-hardlinks.t b/tests/test-hardlinks.t
--- a/tests/test-hardlinks.t
+++ b/tests/test-hardlinks.t
@@ -230,6 +230,7 @@  r4 has hardlinks in the working dir (not
   2 r4/.hg/cache/checkisexec (execbit !)
   ? r4/.hg/cache/checklink-target (glob) (symlink !)
   2 r4/.hg/cache/checknoexec (execbit !)
+  2 r4/.hg/cache/obscache-v01
   2 r4/.hg/cache/rbc-names-v1
   2 r4/.hg/cache/rbc-revs-v1
   2 r4/.hg/dirstate
@@ -276,6 +277,7 @@  Update back to revision 11 in r4 should 
   2 r4/.hg/cache/checkisexec (execbit !)
   2 r4/.hg/cache/checklink-target (symlink !)
   2 r4/.hg/cache/checknoexec (execbit !)
+  2 r4/.hg/cache/obscache-v01
   2 r4/.hg/cache/rbc-names-v1
   2 r4/.hg/cache/rbc-revs-v1
   1 r4/.hg/dirstate
diff --git a/tests/test-inherit-mode.t b/tests/test-inherit-mode.t
--- a/tests/test-inherit-mode.t
+++ b/tests/test-inherit-mode.t
@@ -66,6 +66,7 @@  new directories are setgid
   00600 ./.hg/00changelog.i
   00770 ./.hg/cache/
   00660 ./.hg/cache/branch2-served
+  00660 ./.hg/cache/obscache-v01
   00660 ./.hg/cache/rbc-names-v1
   00660 ./.hg/cache/rbc-revs-v1
   00660 ./.hg/dirstate
@@ -115,6 +116,7 @@  group can still write everything
   00660 ../push/.hg/00changelog.i
   00770 ../push/.hg/cache/
   00660 ../push/.hg/cache/branch2-base
+  00660 ../push/.hg/cache/obscache-v01
   00660 ../push/.hg/dirstate
   00660 ../push/.hg/requires
   00770 ../push/.hg/store/
diff --git a/tests/test-tags.t b/tests/test-tags.t
--- a/tests/test-tags.t
+++ b/tests/test-tags.t
@@ -537,6 +537,7 @@  Strip 1: expose an old head:
 
   $ hg --config extensions.mq= strip 5
   saved backup bundle to $TESTTMP/t3/.hg/strip-backup/*-backup.hg (glob)
+  warning: ignoring unknown working parent 735c3ca72986!
   $ hg tags                  # partly stale cache
   tip                                5:735c3ca72986
   bar                                1:78391a272241
@@ -548,6 +549,7 @@  Strip 2: destroy whole branch, no old he
 
   $ hg --config extensions.mq= strip 4
   saved backup bundle to $TESTTMP/t3/.hg/strip-backup/*-backup.hg (glob)
+  warning: ignoring unknown working parent 735c3ca72986!
   $ hg tags                  # partly stale
   tip                                4:735c3ca72986
   bar                                0:bbd179dfa0a7
@@ -676,6 +678,7 @@  Missing tags2* files means the cache was
   checklink (symlink !)
   checklink-target (symlink !)
   hgtagsfnodes1
+  obscache-v01
 
 Cache should contain the head only, even though other nodes have tags data
 
@@ -701,6 +704,7 @@  Running hg tags should produce tags2* fi
   checklink (symlink !)
   checklink-target (symlink !)
   hgtagsfnodes1
+  obscache-v01
   tags2-visible
 
   $ f --size --hexdump tagsclient/.hg/cache/hgtagsfnodes1