Patchwork [2,of,7,RFC,v2] filelog: raise CensoredNodeError when hash checks fail with censor metadata

login
register
mail settings
Submitter michaeljedgar@gmail.com
Date Sept. 15, 2014, 12:01 p.m.
Message ID <79d3c249a0761c95e3cf.1410782483@adgar-macbookpro3.roam.corp.google.com>
Download mbox | patch
Permalink /patch/5821/
State Accepted
Headers show

Comments

michaeljedgar@gmail.com - Sept. 15, 2014, 12:01 p.m.
# HG changeset patch
# User Mike Edgar <adgar@google.com>
# Date 1409796860 14400
#      Wed Sep 03 22:14:20 2014 -0400
# Node ID 79d3c249a0761c95e3cf885159d4828d73bb4074
# Parent  0dc057ad6ad1e9ab481d461c426e72965d951445
filelog: raise CensoredNodeError when hash checks fail with censor metadata

With this change, when a revlog revision hash does not match its content, and
the content is empty with a special metadata key, the integrity failure is
assumed to be intentionally caused to remove sensitive content from repository
history.

To allow different Mercurial functionality to handle this scenario differently
a more specific exception is raised than "ordinary" hash failures.

Alternatives to this approach include, but are not limited to:

- Calling a hook when hashes mismatch to allow arbitrary tombstone validation.
  Cons: Irresponsibly easy to disable integrity checking altogether.
- Returning empty revision data eagerly instead of raising, masking the error.
  Cons: Push/pull won't roundtrip the tombstone, so client repos are unusable.
- Doing nothing differently at this layer. Callers must do their own detection
  of tombstoned data if they want to handle some hash checks and not others.
  - Impacts dozens of callsites, many of which don't have the revision data
  - Would probably be missing one or two callsites at any given time
  - Currently we throw a RevlogError, as do 12 other places in revlog.py.
    Callers would need to parse the exception message and/or ensure
    RevlogError is not thrown from any other part of their call tree.

Patch

diff -r 0dc057ad6ad1 -r 79d3c249a076 mercurial/filelog.py
--- a/mercurial/filelog.py	Wed Sep 03 15:59:03 2014 -0400
+++ b/mercurial/filelog.py	Wed Sep 03 22:14:20 2014 -0400
@@ -5,7 +5,7 @@ 
 # This software may be used and distributed according to the terms of the
 # GNU General Public License version 2 or any later version.
 
-import revlog
+import error, revlog
 import re
 
 _mdre = re.compile('\1\n')
@@ -27,6 +27,10 @@ 
     metatext = "".join("%s: %s\n" % (k, meta[k]) for k in keys)
     return "\1\n%s\1\n%s" % (metatext, text)
 
+def _censoredtext(text):
+    m, offs = parsemeta(text)
+    return m and "censored" in m and not text[offs:]
+
 class filelog(revlog.revlog):
     def __init__(self, opener, path):
         super(filelog, self).__init__(opener,
@@ -86,5 +90,13 @@ 
 
         return True
 
+    def checkhash(self, text, p1, p2, node, rev=None):
+        try:
+            super(filelog, self).checkhash(text, p1, p2, node, rev=rev)
+        except error.RevlogError:
+            if _censoredtext(text):
+                raise error.CensoredNodeError(self.indexfile, node)
+            raise
+
     def _file(self, f):
         return filelog(self.opener, f)