Patchwork [RFC/EXAMPLE] tags: have a different cache file per filter level

login
register
mail settings
Submitter Pierre-Yves David
Date March 22, 2015, 1:33 a.m.
Message ID <e7a02537f7084036e2de.1426987981@marginatus.alto.octopoid.net>
Download mbox | patch
Permalink /patch/8217/
State Superseded
Commit b061a2049662c5320c7a4af892c67ddd0dcb44d9
Headers show

Comments

Pierre-Yves David - March 22, 2015, 1:33 a.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@fb.com>
# Date 1426968613 25200
#      Sat Mar 21 13:10:13 2015 -0700
# Node ID e7a02537f7084036e2dec01129e96d736f5ea706
# Parent  1cfded2fa1a92ee9b55d1f62675569e340a39083
tags: have a different cache file per filter level

Currently whichever filter level ask for tags last will write the data on disk.
This create massive issue when tags are read for "visible" and "unfiltered"
on large and multi headed repository (like Mozilla central). See issue4550 for
details

We'll want the various cache level to collaborate together (same as want
branchhead cache are doing). But this is not done in this changesets.
Pierre-Yves David - March 22, 2015, 1:35 a.m.
On 03/21/2015 06:33 PM, Pierre-Yves David wrote:
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@fb.com>
> # Date 1426968613 25200
> #      Sat Mar 21 13:10:13 2015 -0700
> # Node ID e7a02537f7084036e2dec01129e96d736f5ea706
> # Parent  1cfded2fa1a92ee9b55d1f62675569e340a39083
> tags: have a different cache file per filter level

This patch is mostly provided as an example to help move forward on 
issue 4550. We should probably have collaboration before accepting such 
patch. However this patch would probably be an improvement for mozilla's 
repo user.
Gregory Szorc - March 23, 2015, 4:18 p.m.
On Sat, Mar 21, 2015 at 6:33 PM, Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@fb.com>
> # Date 1426968613 25200
> #      Sat Mar 21 13:10:13 2015 -0700
> # Node ID e7a02537f7084036e2dec01129e96d736f5ea706
> # Parent  1cfded2fa1a92ee9b55d1f62675569e340a39083
> tags: have a different cache file per filter level
>
> Currently whichever filter level ask for tags last will write the data on
> disk.
> This create massive issue when tags are read for "visible" and "unfiltered"
> on large and multi headed repository (like Mozilla central). See issue4550
> for
> details
>
> We'll want the various cache level to collaborate together (same as want
> branchhead cache are doing). But this is not done in this changesets.
>
>
I strongly support a separate cache per filter. And, I think this patch is
a step in the right direction and would address the immediate concerns in
issue 4550.

However, I have concerns:

1) Each cache will redundantly compute the same .hgtags filenode mappings
for many heads. On large repos (in terms of files and heads), the
performance penalty could be significant.

2) I believe we still have lingering issues with cache invalidation when
some history rewriting operations occur. The bulk of the performance issues
are due to .hgtags filenode resolution. If we could make that mapping less
prone to invalidation, it would eliminate most performance issues.

3) Since clients need to create the tags cache after clone and this could
be expensive, I'd eventually like to "bootstrap" the process via a bundle2
part (containing probably just the .hgtags filenode mapping). This should
be considered for any tags cache refactor.

4) Given 1-3, if we adopt this patch as-is and refactor tags cache later,
we'll have to create a new tags cache file implementation. Part of me
thinks we should do all this at one time so there is 1 transition, not 2.

Between my earlier patch to create a new .hgtags filenode cache and this
patch, I think we're 80% of the way to a solution that combines both forms.
Give me the green light, and I'll hack on it.
Pierre-Yves David - March 24, 2015, 11:46 p.m.
On 03/23/2015 09:18 AM, Gregory Szorc wrote:
> On Sat, Mar 21, 2015 at 6:33 PM, Pierre-Yves David
> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
> wrote:
>
>     # HG changeset patch
>     # User Pierre-Yves David <pierre-yves.david@fb.com
>     <mailto:pierre-yves.david@fb.com>>
>     # Date 1426968613 25200
>     #      Sat Mar 21 13:10:13 2015 -0700
>     # Node ID e7a02537f7084036e2dec01129e96d736f5ea706
>     # Parent  1cfded2fa1a92ee9b55d1f62675569e340a39083
>     tags: have a different cache file per filter level
>
>     Currently whichever filter level ask for tags last will write the
>     data on disk.
>     This create massive issue when tags are read for "visible" and
>     "unfiltered"
>     on large and multi headed repository (like Mozilla central). See
>     issue4550 for
>     details
>
>     We'll want the various cache level to collaborate together (same as want
>     branchhead cache are doing). But this is not done in this changesets.
>
>
> I strongly support a separate cache per filter. And, I think this patch
> is a step in the right direction and would address the immediate
> concerns in issue 4550.
>
> However, I have concerns:
>
> 1) Each cache will redundantly compute the same .hgtags filenode
> mappings for many heads. On large repos (in terms of files and heads),
> the performance penalty could be significant.

We could have a common cache, but just reusing the subset-cache data 
should make for most of the work isn't it?

> 2) I believe we still have lingering issues with cache invalidation when
> some history rewriting operations occur. The bulk of the performance
> issues are due to .hgtags filenode resolution. If we could make that
> mapping less prone to invalidation, it would eliminate most performance
> issues.

Such issue also affect branchhead cache. And the "fallback to 
subset-cache" have been serving us "good enough" so far. (And have 
plenty of room for improvement)

> 3) Since clients need to create the tags cache after clone and this
> could be expensive, I'd eventually like to "bootstrap" the process via a
> bundle2 part (containing probably just the .hgtags filenode mapping).
> This should be considered for any tags cache refactor.

This looks like a good idea. If the server have proper cache at hand it 
should send it. otherwise the client can recompute it after clone (maybe 
from incomplete server data?)

> 4) Given 1-3, if we adopt this patch as-is and refactor tags cache
> later, we'll have to create a new tags cache file implementation. Part
> of me thinks we should do all this at one time so there is 1 transition,
> not 2.

I think that the most pressing issue is to look at the tags algorithm 
and find a refactor it to allow subset collaboration (ala branchheagcache).

The splitting of the two stage in two file will be valuable, but having 
different cache per filter and having them collaborate is the first 
thing to do.
Gregory Szorc - March 25, 2015, 4:58 a.m.
On Tue, Mar 24, 2015 at 4:46 PM, Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 03/23/2015 09:18 AM, Gregory Szorc wrote:
>
>> On Sat, Mar 21, 2015 at 6:33 PM, Pierre-Yves David
>> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
>> wrote:
>>
>>     # HG changeset patch
>>     # User Pierre-Yves David <pierre-yves.david@fb.com
>>     <mailto:pierre-yves.david@fb.com>>
>>     # Date 1426968613 25200
>>     #      Sat Mar 21 13:10:13 2015 -0700
>>     # Node ID e7a02537f7084036e2dec01129e96d736f5ea706
>>     # Parent  1cfded2fa1a92ee9b55d1f62675569e340a39083
>>     tags: have a different cache file per filter level
>>
>>     Currently whichever filter level ask for tags last will write the
>>     data on disk.
>>     This create massive issue when tags are read for "visible" and
>>     "unfiltered"
>>     on large and multi headed repository (like Mozilla central). See
>>     issue4550 for
>>     details
>>
>>     We'll want the various cache level to collaborate together (same as
>> want
>>     branchhead cache are doing). But this is not done in this changesets.
>>
>>
>> I strongly support a separate cache per filter. And, I think this patch
>> is a step in the right direction and would address the immediate
>> concerns in issue 4550.
>>
>> However, I have concerns:
>>
>> 1) Each cache will redundantly compute the same .hgtags filenode
>> mappings for many heads. On large repos (in terms of files and heads),
>> the performance penalty could be significant.
>>
>
> We could have a common cache, but just reusing the subset-cache data
> should make for most of the work isn't it?
>
>  2) I believe we still have lingering issues with cache invalidation when
>> some history rewriting operations occur. The bulk of the performance
>> issues are due to .hgtags filenode resolution. If we could make that
>> mapping less prone to invalidation, it would eliminate most performance
>> issues.
>>
>
> Such issue also affect branchhead cache. And the "fallback to
> subset-cache" have been serving us "good enough" so far. (And have plenty
> of room for improvement)
>
>  3) Since clients need to create the tags cache after clone and this
>> could be expensive, I'd eventually like to "bootstrap" the process via a
>> bundle2 part (containing probably just the .hgtags filenode mapping).
>> This should be considered for any tags cache refactor.
>>
>
> This looks like a good idea. If the server have proper cache at hand it
> should send it. otherwise the client can recompute it after clone (maybe
> from incomplete server data?)
>
>  4) Given 1-3, if we adopt this patch as-is and refactor tags cache
>> later, we'll have to create a new tags cache file implementation. Part
>> of me thinks we should do all this at one time so there is 1 transition,
>> not 2.
>>
>
> I think that the most pressing issue is to look at the tags algorithm and
> find a refactor it to allow subset collaboration (ala branchheagcache).
>
> The splitting of the two stage in two file will be valuable, but having
> different cache per filter and having them collaborate is the first thing
> to do.
>

I just sent a patch that if landed before per-filter tags cache lands, will
enable us to transition to an external/shared .hgtags filenode cache in the
future without transition pain. As long as you feel the planned future
format contains enough info to e.g. do efficient cache validation checking,
you can land per-filter caches now for a nice win and I can land a shared
.hgtags filenode cache later for an even greater performance win. I think
it allows us both to achieve what each of us want without hindering the
other. Worst case, we decide on something completely different and my
attempts at future compatibility are wasted and we're back where we
started. i.e. there's not much to lose.

Patch

diff --git a/mercurial/tags.py b/mercurial/tags.py
--- a/mercurial/tags.py
+++ b/mercurial/tags.py
@@ -181,21 +181,28 @@  def _updatetags(filetags, tagtype, allta
 # working correctly.  And this gives the biggest performance win: it
 # avoids looking up .hgtags in the manifest for every head, and it
 # can avoid calling heads() at all if there have been no changes to
 # the repo.
 
+def _filename(repo):
+    """name of a tagcache file for a given repo or repoview"""
+    filename = 'cache/tags'
+    if repo.filtername:
+        filename = '%s-%s' % (filename, repo.filtername)
+    return filename
+
 def _readtagcache(ui, repo):
     '''Read the tag cache and return a tuple (heads, fnodes, cachetags,
     shouldwrite).  If the cache is completely up-to-date, cachetags is a
     dict of the form returned by _readtags(); otherwise, it is None and
     heads and fnodes are set.  In that case, heads is the list of all
     heads currently in the repository (ordered from tip to oldest) and
     fnodes is a mapping from head to .hgtags filenode.  If those two are
     set, caller is responsible for reading tag info from each head.'''
 
     try:
-        cachefile = repo.vfs('cache/tags', 'r')
+        cachefile = repo.vfs(_filename(repo), 'r')
         # force reading the file for static-http
         cachelines = iter(cachefile)
     except IOError:
         cachefile = None
 
@@ -226,11 +233,11 @@  def _readtagcache(ui, repo):
                 if len(line) == 3:
                     fnode = bin(line[2])
                     cachefnode[headnode] = fnode
         except Exception:
             # corruption of the tags cache, just recompute it
-            ui.warn(_('.hg/cache/tags is corrupt, rebuilding it\n'))
+            ui.warn(_('.hg/%s is corrupt, rebuilding it\n') % _filename(repo))
             cacheheads = []
             cacherevs = []
             cachefnode = {}
 
     tipnode = repo.changelog.tip()
@@ -301,11 +308,11 @@  def _readtagcache(ui, repo):
     return (repoheads, cachefnode, None, True)
 
 def _writetagcache(ui, repo, heads, tagfnode, cachetags):
 
     try:
-        cachefile = repo.vfs('cache/tags', 'w', atomictemp=True)
+        cachefile = repo.vfs(_filename(repo), 'w', atomictemp=True)
     except (OSError, IOError):
         return
 
     ui.log('tagscache', 'writing tags cache file with %d heads and %d tags\n',
             len(heads), len(cachetags))
diff --git a/tests/test-mq.t b/tests/test-mq.t
--- a/tests/test-mq.t
+++ b/tests/test-mq.t
@@ -309,28 +309,28 @@  qpop
 
 
 qpush with dump of tag cache
 Dump the tag cache to ensure that it has exactly one head after qpush.
 
-  $ rm -f .hg/cache/tags
+  $ rm -f .hg/cache/tags-visible
   $ hg tags > /dev/null
 
-.hg/cache/tags (pre qpush):
+.hg/cache/tags-visible (pre qpush):
 
-  $ cat .hg/cache/tags
+  $ cat .hg/cache/tags-visible
   1 [\da-f]{40} (re)
   
   $ hg qpush
   applying test.patch
   now at: test.patch
   $ hg phase -r qbase
   2: draft
   $ hg tags > /dev/null
 
-.hg/cache/tags (post qpush):
+.hg/cache/tags-visible (post qpush):
 
-  $ cat .hg/cache/tags
+  $ cat .hg/cache/tags-visible
   2 [\da-f]{40} (re)
   
   $ checkundo qpush
   $ cd ..
 
diff --git a/tests/test-obsolete-tag-cache.t b/tests/test-obsolete-tag-cache.t
--- a/tests/test-obsolete-tag-cache.t
+++ b/tests/test-obsolete-tag-cache.t
@@ -34,11 +34,11 @@  Trigger tags cache population by doing s
   | o  1:5f97d42da03f  test tag
   |/
   o  0:55482a6fb4b1 test1 initial
   
 
-  $ cat .hg/cache/tags
+  $ cat .hg/cache/tags-visible
   4 042eb6bfcc4909bad84a1cbf6eb1ddf0ab587d41
   3 c3cb30f2d2cd0aae008cc91a07876e3c5131fd22 b3bce87817fe7ac9dca2834366c1d7534c095cf1
   
   55482a6fb4b1881fa8f746fd52cf6f096bb21c89 test1
   d75775ffbc6bca1794d300f5571272879bd280da test2
@@ -60,11 +60,11 @@  repopulation
   
 
 .hgtags filenodes for hidden heads should be visible (issue4550)
 (currently broken)
 
-  $ cat .hg/cache/tags
+  $ cat .hg/cache/tags-visible
   7 eb610439e10e0c6b296f97b59624c2e24fc59e30 b3bce87817fe7ac9dca2834366c1d7534c095cf1
   
   55482a6fb4b1881fa8f746fd52cf6f096bb21c89 test1
   d75775ffbc6bca1794d300f5571272879bd280da test2
 
diff --git a/tests/test-tags.t b/tests/test-tags.t
--- a/tests/test-tags.t
+++ b/tests/test-tags.t
@@ -1,9 +1,9 @@ 
 Helper functions:
 
   $ cacheexists() {
-  >   [ -f .hg/cache/tags ] && echo "tag cache exists" || echo "no tag cache"
+  >   [ -f .hg/cache/tags-visible ] && echo "tag cache exists" || echo "no tag cache"
   > }
 
   $ dumptags() {
   >     rev=$1
   >     echo "rev $rev: .hgtags:"
@@ -34,13 +34,13 @@  Setup:
   $ cacheexists
   tag cache exists
 
 Try corrupting the cache
 
-  $ printf 'a b' > .hg/cache/tags
+  $ printf 'a b' > .hg/cache/tags-visible
   $ hg identify
-  .hg/cache/tags is corrupt, rebuilding it
+  .hg/cache/tags-visible is corrupt, rebuilding it
   acb14030fe0a tip
   $ cacheexists
   tag cache exists
   $ hg identify
   acb14030fe0a tip
@@ -67,18 +67,18 @@  Create a tag behind hg's back:
   $ hg identify
   b9154636be93 tip
 
 Repeat with cold tag cache:
 
-  $ rm -f .hg/cache/tags
+  $ rm -f .hg/cache/tags-visible
   $ hg identify
   b9154636be93 tip
 
 And again, but now unable to write tag cache:
 
 #if unix-permissions
-  $ rm -f .hg/cache/tags
+  $ rm -f .hg/cache/tags-visible
   $ chmod 555 .hg
   $ hg identify
   b9154636be93 tip
   $ chmod 755 .hg
 #endif
@@ -214,11 +214,11 @@  Detailed dump of tag info:
   rev 4: .hgtags:
   bbd179dfa0a71671c253b3ae0aa1513b60d199fa bar
 
 Dump cache:
 
-  $ cat .hg/cache/tags
+  $ cat .hg/cache/tags-visible
   4 0c192d7d5e6b78a714de54a2e9627952a877e25a 0c04f2a8af31de17fab7422878ee5a2dadbc943d
   3 6fa450212aeb2a21ed616a54aea39a4a27894cd7 7d3b718c964ef37b89e550ebdafd5789e76ce1b0
   2 7a94127795a33c10a370c93f731fd9fea0b79af6 0c04f2a8af31de17fab7422878ee5a2dadbc943d
   
   bbd179dfa0a71671c253b3ae0aa1513b60d199fa bar
@@ -326,11 +326,11 @@  Strip 2: destroy whole branch, no old he
   $ hg --config extensions.mq= strip 4
   saved backup bundle to $TESTTMP/t3/.hg/strip-backup/*-backup.hg (glob)
   $ hg tags                  # partly stale
   tip                                4:735c3ca72986
   bar                                0:bbd179dfa0a7
-  $ rm -f .hg/cache/tags
+  $ rm -f .hg/cache/tags-visible
   $ hg tags                  # cold cache
   tip                                4:735c3ca72986
   bar                                0:bbd179dfa0a7
 
 Test tag rank with 3 heads: