Patchwork [4,of,4,tags-cache-split] tags: periodically prune .hgtags fnodes cache

login
register
mail settings
Submitter Gregory Szorc
Date March 30, 2015, 6:14 a.m.
Message ID <d445bbf3a325823f5240.1427696046@vm-ubuntu-main.gateway.sonic.net>
Download mbox | patch
Permalink /patch/8369/
State Accepted
Headers show

Comments

Gregory Szorc - March 30, 2015, 6:14 a.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1427695476 25200
#      Sun Mar 29 23:04:36 2015 -0700
# Node ID d445bbf3a325823f5240f5dfbf0d553e02eca0d9
# Parent  9b396004b4338aa3e011cbf4f13800a3c55b1ab6
tags: periodically prune .hgtags fnodes cache

The newly introduced .hgtags fnodes cache could grow without bounds.
Unbounded growth is bad and can lead to performance problems. This
patch adds periodic pruning of this cache via random chance.

As is currently implemented, the .hgtags fnodes cache will get pruned
once every 1000 saves. Why 1000? It seemed like not too little and not
too much.

On a Firefox repository on my machine, pruning the cache takes
0.10 to 0.15s. This is short enough that it is tempting to perform
the pruning on every save. However, since pruning the cache requires
finding heads for every filter, I suspect people with repos of
more than 200,000 commits may take exception to this. We can always
change the logic later once we observe behavior in the wild.

Some may take exception to the random behavior introduced here. I'm
not a huge fan of it either. I like determinism too. However, I'm not
sure what alternatives there are. I considered checking whether the
total head count is wholely divisible by a number. But I'm not
convinced we would reliably his this condition because the tags cache
isn't saved with any reliability. Therefore, its entirely possible
for prunes to get skipped if e.g. the count of entries jumps by N>1.
If anyone has any better ideas, I'm receptive to implementing them.

Patch

diff --git a/mercurial/tags.py b/mercurial/tags.py
--- a/mercurial/tags.py
+++ b/mercurial/tags.py
@@ -15,8 +15,9 @@  from i18n import _
 import util
 import encoding
 import error
 import errno
+import random
 import time
 
 # The tags cache stores information about heads and the history of tags.
 #
@@ -510,11 +511,42 @@  def _readhgtagsfnodescache(ui, repo):
 
 def _updatehgtagsfnodescache(ui, repo, existing, fnodes):
     """Update the cache file mapping changeset nodes to .hgtags fnodes.
 
-    For now, all entries are preserved for all of time. In the future, we
-    should consider pruning this cache so its growth isn't unbounded.
+    By default, new entries are appended to the cache. This means entries
+    can accumulate forever, even if they are no longer heads and not relevant
+    to the tags cache. We combat this by randomly examining the full set of
+    changesets for relevant heads and re-writing the cache from scratch.
     """
+    # 1/1000 is infrequent enough to not have an impact on performance but
+    # frequent enough that the cache should never get too large. 1000 entries
+    # is 20 x 2 x 1000 = 40,000 bytes, which is likely smaller for repos that
+    # have enough activity to trigger this.
+    chance = ui.configint('tags', 'prunechance', 1000)
+    prune = chance >= 0 and random.randint(0, chance) == 0
+
+    if prune:
+        starttime = time.time()
+        allfnodes = dict(existing)
+        allfnodes.update(fnodes)
+        relevant = _relevanthgtagsfnodes(repo, allfnodes)
+        duration = time.time() - starttime
+        ui.log('tagscache',
+               'pruned %d to %d .hgtags fnodes entries in %0.4f seconds\n',
+               len(allfnodes), len(relevant), duration)
+
+        entries = []
+        for head, node in sorted(relevant.items()):
+            entries.append('%s%s' % (head, node))
+        data = ''.join(entries)
+
+        try:
+            repo.vfs.write(_fnodescachefile, data)
+        except (IOError, OSError):
+            pass
+
+        return
+
     # Append new entries instead of re-writing existing content to reduce
     # I/O.
     missing = set(fnodes.keys()) - set(existing.keys())
 
@@ -532,4 +564,27 @@  def _updatehgtagsfnodescache(ui, repo, e
 
     ui.log('tagscache',
            'appended %d entries to hgtags filenodes cache\n',
            len(missing))
+
+def _relevanthgtagsfnodes(repo, existing):
+    """Obtain a mapping of relevent .hgtags fnodes for the repository.
+
+    Given a dict of existing fnodes in the cache and a set of incoming fnodes,
+    obtain a dictionary containing only the fnodes that belong to a head on a
+    repo filter.
+    """
+    import repoview # avoid cycle
+
+    repo = repo.unfiltered()
+    heads = set()
+    heads |= set(repo.heads())
+    for fltr in repoview.filtertable:
+        filtered = repo.filtered(fltr)
+        heads |= set(filtered.heads())
+
+    d = {}
+    for head, fnode in existing.iteritems():
+        if head in heads:
+            d[head] = fnode
+
+    return d
diff --git a/tests/test-tags.t b/tests/test-tags.t
--- a/tests/test-tags.t
+++ b/tests/test-tags.t
@@ -1,4 +1,30 @@ 
+Remove randomness from hgtags fnodes pruning
+
+  $ cat > mock.py << EOF
+  > from mercurial import util
+  > 
+  > def makedate():
+  >     return 0, 0
+  > def getuser():
+  >     return 'bob'
+  > def uisetup(ui):
+  >     util.makedate = makedate
+  >     util.getuser = getuser
+  > EOF
+
+  $ cat >> $HGRCPATH << EOF
+  > [extensions]
+  > blackbox = 
+  > mock = `pwd`/mock.py
+  > 
+  > [blackbox]
+  > track = *
+  > 
+  > [tags]
+  > prunechance = -1
+  > EOF
+
 Helper functions:
 
   $ cacheexists() {
   >   [ -f .hg/cache/tags1-visible ] && echo "tag cache exists" || echo "no tag cache"
@@ -381,8 +407,49 @@  Errors writing to .hgtags fnodes cache a
 
   $ chmod a+w .hg/cache/hgtagsfnodes1
 #endif
 
+Non-heads should disappear when a prune is performed
+
+  $ rm .hg/cache/tags1-visible
+  $ hg tags
+  tip                                6:039af0ff94d0
+  bar                                1:78391a272241
+  $ f --size --hexdump .hg/cache/hgtagsfnodes1
+  .hg/cache/hgtagsfnodes1: size=120
+  0000: 03 9a f0 ff 94 d0 d4 82 7b 5c 22 fc 52 48 db 22 |........{\".RH."|
+  0010: 9b 17 cf 29 0c 04 f2 a8 af 31 de 17 fa b7 42 28 |...).....1....B(|
+  0020: 78 ee 5a 2d ad bc 94 3d 6f a4 50 21 2a eb 2a 21 |x.Z-...=o.P!*.*!|
+  0030: ed 61 6a 54 ae a3 9a 4a 27 89 4c d7 7d 3b 71 8c |.ajT...J'.L.};q.|
+  0040: 96 4e f3 7b 89 e5 50 eb da fd 57 89 e7 6c e1 b0 |.N.{..P...W..l..|
+  0050: 7a 94 12 77 95 a3 3c 10 a3 70 c9 3f 73 1f d9 fe |z..w..<..p.?s...|
+  0060: a0 b7 9a f6 0c 04 f2 a8 af 31 de 17 fa b7 42 28 |.........1....B(|
+  0070: 78 ee 5a 2d ad bc 94 3d                         |x.Z-...=|
+
+  $ echo newhead > foo
+  $ hg commit -m newhead
+  $ hg --config tags.prunechance=0 tags
+  tip                                7:1b0ab2d0c284
+  bar                                1:78391a272241
+  $ f --size --hexdump .hg/cache/hgtagsfnodes1
+  .hg/cache/hgtagsfnodes1: size=120
+  0000: 1b 0a b2 d0 c2 84 4c 0c 34 c9 e3 fc 94 b9 d0 02 |......L.4.......|
+  0010: 8b e9 6f f8 0c 04 f2 a8 af 31 de 17 fa b7 42 28 |..o......1....B(|
+  0020: 78 ee 5a 2d ad bc 94 3d 6f a4 50 21 2a eb 2a 21 |x.Z-...=o.P!*.*!|
+  0030: ed 61 6a 54 ae a3 9a 4a 27 89 4c d7 7d 3b 71 8c |.ajT...J'.L.};q.|
+  0040: 96 4e f3 7b 89 e5 50 eb da fd 57 89 e7 6c e1 b0 |.N.{..P...W..l..|
+  0050: 7a 94 12 77 95 a3 3c 10 a3 70 c9 3f 73 1f d9 fe |z..w..<..p.?s...|
+  0060: a0 b7 9a f6 0c 04 f2 a8 af 31 de 17 fa b7 42 28 |.........1....B(|
+  0070: 78 ee 5a 2d ad bc 94 3d                         |x.Z-...=|
+
+  $ hg blackbox -l 6
+  1970/01/01 00:00:00 bob> tags
+  1970/01/01 00:00:00 bob> read 3 entries from hgtags filenodes cache
+  1970/01/01 00:00:00 bob> resolved 3 tags cache entries from 1 manifests in * seconds (glob)
+  1970/01/01 00:00:00 bob> writing tags cache file with 3 heads and 1 tags
+  1970/01/01 00:00:00 bob> pruned 4 to 3 .hgtags fnodes entries in * seconds (glob)
+  1970/01/01 00:00:00 bob> --config tags.prunechance=0 tags exited 0 after * seconds (glob)
+
   $ hg -q --config extensions.strip= strip -r 5: --no-backup
 
 Test tag removal: