Patchwork [2,of,2] tags: detect and repair corrupt tags portion of tags cache

login
register
mail settings
Submitter Gregory Szorc
Date March 30, 2015, 12:10 a.m.
Message ID <2cb6b78cf818a082ee9a.1427674218@vm-ubuntu-main.gateway.sonic.net>
Download mbox | patch
Permalink /patch/8364/
State Changes Requested
Headers show

Comments

Gregory Szorc - March 30, 2015, 12:10 a.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1427672229 25200
#      Sun Mar 29 16:37:09 2015 -0700
# Node ID 2cb6b78cf818a082ee9a2518968f88c1819efed0
# Parent  2e21d3312350ce63785cda82526c951211e76bab
tags: detect and repair corrupt tags portion of tags cache

As part of developing a tags cache patch, I inadvertently introduced
malformed tags entries in a tags cache file. I was surprised to learn
that Mercurial would let this malformed data linger and would print
warnings about it until the tags cache was updated.

This patch changes the behavior so in invalid tags cache is
proactively repaired. This makes the end-user experience better, as it
prevents Mercurial from getting in a possibly endless loop of warning
about a malformed tags cache.
Matt Mackall - March 31, 2015, 1:29 p.m.
On Sun, 2015-03-29 at 17:10 -0700, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1427672229 25200
> #      Sun Mar 29 16:37:09 2015 -0700
> # Node ID 2cb6b78cf818a082ee9a2518968f88c1819efed0
> # Parent  2e21d3312350ce63785cda82526c951211e76bab
> tags: detect and repair corrupt tags portion of tags cache
> 
> As part of developing a tags cache patch, I inadvertently introduced
> malformed tags entries in a tags cache file. I was surprised to learn
> that Mercurial would let this malformed data linger and would print
> warnings about it until the tags cache was updated.
> 
> This patch changes the behavior so in invalid tags cache is
> proactively repaired. This makes the end-user experience better, as it
> prevents Mercurial from getting in a possibly endless loop of warning
> about a malformed tags cache.

Seems sensible, but see my previous note about how we should be silent
when doing this.
Gregory Szorc - March 31, 2015, 6:43 p.m.
On Tue, Mar 31, 2015 at 6:29 AM, Matt Mackall <mpm@selenic.com> wrote:

> On Sun, 2015-03-29 at 17:10 -0700, Gregory Szorc wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc@gmail.com>
> > # Date 1427672229 25200
> > #      Sun Mar 29 16:37:09 2015 -0700
> > # Node ID 2cb6b78cf818a082ee9a2518968f88c1819efed0
> > # Parent  2e21d3312350ce63785cda82526c951211e76bab
> > tags: detect and repair corrupt tags portion of tags cache
> >
> > As part of developing a tags cache patch, I inadvertently introduced
> > malformed tags entries in a tags cache file. I was surprised to learn
> > that Mercurial would let this malformed data linger and would print
> > warnings about it until the tags cache was updated.
> >
> > This patch changes the behavior so in invalid tags cache is
> > proactively repaired. This makes the end-user experience better, as it
> > prevents Mercurial from getting in a possibly endless loop of warning
> > about a malformed tags cache.
>
> Seems sensible, but see my previous note about how we should be silent
> when doing this.
>

I was going to say that there is precedence for this and I was just
following precedent. But I see you've just committed the removal of the
precedent. I'll update this patch.

Patch

diff --git a/mercurial/tags.py b/mercurial/tags.py
--- a/mercurial/tags.py
+++ b/mercurial/tags.py
@@ -100,9 +100,10 @@  def findglobaltags(ui, repo, alltags, ta
                 fctx = repo.filectx('.hgtags', fileid=fnode)
             else:
                 fctx = fctx.filectx(fnode)
 
-            filetags = _readtags(ui, repo, fctx.data().splitlines(), fctx)
+            filetags, warned = _readtags(ui, repo, fctx.data().splitlines(),
+                                         fctx)
             _updatetags(filetags, 'global', alltags, tagtypes)
 
     # and update the cache (if necessary)
     if shouldwrite:
@@ -118,9 +119,9 @@  def readlocaltags(ui, repo, alltags, tag
         return
 
     # localtags is in the local encoding; re-encode to UTF-8 on
     # input for consistency with the rest of this module.
-    filetags = _readtags(
+    filetags, warned = _readtags(
         ui, repo, data.splitlines(), "localtags",
         recode=encoding.fromlocal)
 
     # remove tags pointing to invalid nodes
@@ -135,9 +136,10 @@  def readlocaltags(ui, repo, alltags, tag
 
 def _readtaghist(ui, repo, lines, fn, recode=None, calcnodelines=False):
     '''Read tag definitions from a file (or any source of lines).
 
-    This function returns two sortdicts with similar information:
+    This function returns a 3-tuple. The first two items are sortdics with
+    similar information:
 
     - the first dict, bintaghist, contains the tag information as expected by
       the _readtags function, i.e. a mapping from tag name to (node, hist):
         - node is the node id from the last line read for that name,
@@ -146,16 +148,20 @@  def _readtaghist(ui, repo, lines, fn, re
 
     - the second dict, hextaglines, is a mapping from tag name to a list of
       [hexnode, line number] pairs, ordered from the oldest to the newest node.
 
+    The third item is a bool indicating whether warnings were encountered. This
+    is intended to be used as a signal to force cache re-rewriting.
+
     When calcnodelines is False the hextaglines dict is not calculated (an
     empty dict is returned). This is done to improve this function's
     performance in cases where the line numbers are not needed.
     '''
 
     bintaghist = util.sortdict()
     hextaglines = util.sortdict()
     count = 0
+    warned = False
 
     def warn(msg):
         ui.warn(_("%s, line %s: %s\n") % (fn, count, msg))
 
@@ -166,16 +172,18 @@  def _readtaghist(ui, repo, lines, fn, re
         try:
             (nodehex, name) = line.split(" ", 1)
         except ValueError:
             warn(_("cannot parse entry"))
+            warned = True
             continue
         name = name.strip()
         if recode:
             name = recode(name)
         try:
             nodebin = bin(nodehex)
         except TypeError:
             warn(_("node '%s' is not well formed") % nodehex)
+            warned = True
             continue
 
         # update filetags
         if calcnodelines:
@@ -187,9 +195,9 @@  def _readtaghist(ui, repo, lines, fn, re
         # map tag name to (node, hist)
         if name not in bintaghist:
             bintaghist[name] = []
         bintaghist[name].append(nodebin)
-    return bintaghist, hextaglines
+    return bintaghist, hextaglines, warned
 
 def _readtags(ui, repo, lines, fn, recode=None, calcnodelines=False):
     '''Read tag definitions from a file (or any source of lines).
 
@@ -198,13 +206,14 @@  def _readtags(ui, repo, lines, fn, recod
     "node" is the node id from the last line read for that name. "hist"
     is the list of node ids previously associated with it (in file order).
     All node ids are binary, not hex.
     '''
-    filetags, nodelines = _readtaghist(ui, repo, lines, fn, recode=recode,
-                                       calcnodelines=calcnodelines)
+    filetags, nodelines, warned = _readtaghist(ui, repo, lines, fn,
+                                               recode=recode,
+                                               calcnodelines=calcnodelines)
     for tag, taghist in filetags.items():
         filetags[tag] = (taghist[-1], taghist[:-1])
-    return filetags
+    return filetags, warned
 
 def _updatetags(filetags, tagtype, alltags, tagtypes):
     '''Incorporate the tag info read from one file into the two
     dictionaries, alltags and tagtypes, that contain all tag
@@ -284,11 +293,15 @@  def _readtagcache(ui, repo):
     # (Unchanged tip trivially means no changesets have been added.
     # But, thanks to localrepository.destroyed(), it also means none
     # have been destroyed by strip or rollback.)
     if cacheheads and cacheheads[0] == tipnode and cacherevs[0] == tiprev:
-        tags = _readtags(ui, repo, cachelines, cachefile.name)
+        tags, warned = _readtags(ui, repo, cachelines, cachefile.name)
         cachefile.close()
-        return (None, None, tags, False)
+        cachefile = None
+        # If tag reading had issues, fall through and repair the cache.
+        if not warned:
+            return (None, None, tags, False)
+        ui.warn(_('.hg/cache/tags is corrupt; rebuilding it\n'))
     if cachefile:
         cachefile.close()               # ignore rest of file
 
     repoheads = repo.heads()
diff --git a/tests/test-tags.t b/tests/test-tags.t
--- a/tests/test-tags.t
+++ b/tests/test-tags.t
@@ -157,8 +157,55 @@  Report tag parse error on other head:
   .hgtags@75d9f02dfe28, line 4: node 'foo' is not well formed
   .hgtags@c4be69a18c11, line 2: node 'x' is not well formed
   tip                                8:c4be69a18c11
   first                              0:acb14030fe0a
+
+  $ cat .hg/cache/tags
+  8 c4be69a18c11e8bc3a5fdbb576017c25f7d84663 9876b1193cfc564b1518d1f1b4459028ec75bf18
+  7 75d9f02dfe2874aa938ee8c18fa27c1328cfb023 7371bc5168f70e1b7c8dbf7c8bedf9d79f51dd82
+  
+  acb14030fe0a21b60322c440ad2d20cf7685a376 first
+
+  $ hg tip
+  changeset:   8:c4be69a18c11
+  tag:         tip
+  parent:      3:ac5e980c4dc0
+  user:        test
+  date:        Thu Jan 01 00:00:00 1970 +0000
+  summary:     head
+  
+
+Test a similar corruption issue in the tags cache
+We should see a warning when loading the tags cache and a repaired version
+should be written automatically.
+
+  $ cat > .hg/cache/tags << EOF
+  > 8 c4be69a18c11e8bc3a5fdbb576017c25f7d84663 9876b1193cfc564b1518d1f1b4459028ec75bf18
+  > 7 75d9f02dfe2874aa938ee8c18fa27c1328cfb023 7371bc5168f70e1b7c8dbf7c8bedf9d79f51dd82
+  > 
+  > zzabc123 first
+  > EOF
+
+  $ hg tip
+  changeset:   8:c4be69a18c11
+  $TESTTMP/t/.hg/cache/tags, line 1: node 'zzabc123' is not well formed
+  .hg/cache/tags is corrupt; rebuilding it
+  .hgtags@75d9f02dfe28, line 2: cannot parse entry
+  .hgtags@75d9f02dfe28, line 4: node 'foo' is not well formed
+  .hgtags@c4be69a18c11, line 2: node 'x' is not well formed
+  tag:         tip
+  parent:      3:ac5e980c4dc0
+  user:        test
+  date:        Thu Jan 01 00:00:00 1970 +0000
+  summary:     head
+  
+
+  $ cat .hg/cache/tags
+  8 c4be69a18c11e8bc3a5fdbb576017c25f7d84663 9876b1193cfc564b1518d1f1b4459028ec75bf18
+  7 75d9f02dfe2874aa938ee8c18fa27c1328cfb023 7371bc5168f70e1b7c8dbf7c8bedf9d79f51dd82
+  
+  acb14030fe0a21b60322c440ad2d20cf7685a376 first
+
   $ hg tip
   changeset:   8:c4be69a18c11
   tag:         tip
   parent:      3:ac5e980c4dc0