Patchwork [20,of,21,RFC] censor: new "record" function commits to .hgcensored, returns tombstone data

login
register
mail settings
Submitter michaeljedgar@gmail.com
Date Sept. 11, 2014, 12:26 a.m.
Message ID <eff2398c409bcd7b7033.1410395181@adgar-macbookpro3.roam.corp.google.com>
Download mbox | patch
Permalink /patch/5795/
State Changes Requested
Headers show

Comments

michaeljedgar@gmail.com - Sept. 11, 2014, 12:26 a.m.
# HG changeset patch
# User Mike Edgar <adgar@google.com>
# Date 1410323971 14400
#      Wed Sep 10 00:39:31 2014 -0400
# Node ID eff2398c409bcd7b70335f57563095fb63adf270
# Parent  41b44791ab7feaeceaaf4b1031564526d497e945
censor: new "record" function commits to .hgcensored, returns tombstone data

Verifiable censorship requires a tracked whitelist of file revisions which are
expected to fail integrity checks. Commits to the whitelist should be part of
the usual repository history: nontrivial descendants of such a commit signal
trust in a particular choice to excise history.
Durham Goode - Sept. 11, 2014, 2:09 a.m.
On 9/10/14, 5:26 PM, michaeljedgar@gmail.com wrote:
> # HG changeset patch
> # User Mike Edgar <adgar@google.com>
> # Date 1410323971 14400
> #      Wed Sep 10 00:39:31 2014 -0400
> # Node ID eff2398c409bcd7b70335f57563095fb63adf270
> # Parent  41b44791ab7feaeceaaf4b1031564526d497e945
> censor: new "record" function commits to .hgcensored, returns tombstone data
>
> +def record(repo, f, fnode, message, user, date):
> +    """Record censoring the file `f` at its filelog revision `fnode`"""
> +    entry = '%s %s' % (repo.store.encode(f), hex(fnode))
>
I'm not sure you need to encode the file path here?  Every filelog has a 
specific casing, and the encoding is just used to disambiguate when 
reading filelogs off disk.  So you can assume the path here has the 
appropriate casing and just write the string directly.  Just like how 
the manifest doesn't encode the file paths before storing them.  In 
other words, writing the string directly to .hgcensored doesn't lose any 
casing data, so encoding it seems superfluous.

You probably also need to sanitize the path given to the censor command, 
before saving it to .hgcensored.  To make sure it exactly matches an 
existing filelog's casing, rather than just accepting what the user typed.

Not encoding here would get rid of patch 19.

Patch

diff -r 41b44791ab7f -r eff2398c409b mercurial/censor.py
--- a/mercurial/censor.py	Wed Sep 10 00:08:37 2014 -0400
+++ b/mercurial/censor.py	Wed Sep 10 00:39:31 2014 -0400
@@ -4,7 +4,8 @@ 
 # GNU General Public License version 2 or any later version.
 
 from node import bin, hex
-import filelog
+import filelog, match
+import bisect, errno
 
 def allowed(repo, f, node):
     """Return whether the given file may be censored at the given revision."""
@@ -27,3 +28,28 @@ 
         return expected in censored
     else:
         return False
+
+def record(repo, f, fnode, message, user, date):
+    """Record censoring the file `f` at its filelog revision `fnode`"""
+    entry = '%s %s' % (repo.store.encode(f), hex(fnode))
+    try:
+        fp = repo.wfile('.hgcensored', 'rb+')
+        lines = fp.read().splitlines()
+        if entry not in lines:
+            bisect.insort(lines, entry)
+            fp.seek(0)
+            for line in lines:
+                fp.write(line + '\n')
+    except IOError, e:
+        if e.errno != errno.ENOENT:
+            raise
+        fp = repo.wfile('.hgcensored', 'ab')
+        fp.write(entry)
+    fp.close()
+
+    repo.invalidatecaches()
+    if '.hgcensored' not in repo.dirstate:
+        repo[None].add(['.hgcensored'])
+    m = match.exact(repo.root, '', ['.hgcensored'])
+    censormeta = {"censored": hex(repo.commit(message, user, date, match=m))}
+    return filelog.packmeta(censormeta, '')