Patchwork [censor,RFC] revlog: addgroup checks if incoming deltas add censored revs, sets flag bit

login
register
mail settings
Submitter adgar@google.com
Date March 9, 2015, 7:56 p.m.
Message ID <816f69e12902d419421b.1425931010@adgar.nyc.corp.google.com>
Download mbox | patch
Permalink /patch/7953/
State Accepted
Commit 4bfe9f2d9761160122968c2a96824d964fb6e3e9
Delegated to: Pierre-Yves David
Headers show

Comments

adgar@google.com - March 9, 2015, 7:56 p.m.
# HG changeset patch
# User Mike Edgar <adgar@google.com>
# Date 1421266568 18000
#      Wed Jan 14 15:16:08 2015 -0500
# Node ID 816f69e12902d419421bb148705f7bc65e0f7774
# Parent  02d7b5cd373bbb4e8263dad9bfbf9c4c3b0e4e3a
revlog: addgroup checks if incoming deltas add censored revs, sets flag bit

A censored revision stored in a revlog should have the censored revlog index
flag bit set. This implies we must know if a revision is censored before we
add it to the revlog. When adding revisions from exchanged deltas, we would
prefer to determine this flag without decoding every single full text.

This change introduces a heuristic based on assumptions around the Mercurial
delta format and filelog metadata. Since deltas which produce a censored
revision must be full-replacement deltas, we can read the delta's first bytes
to check the filelog metadata. Since "censored" is the alphabetically first
filelog metadata key, censored filelog revisions have a well-known prefix we
can look for.

For more on the design and background of the censorship feature, see:
http://mercurial.selenic.com/wiki/CensorPlan
Pierre-Yves David - March 10, 2015, 2:37 a.m.
On 03/09/2015 12:56 PM, Mike Edgar wrote:
> # HG changeset patch
> # User Mike Edgar <adgar@google.com>
> # Date 1421266568 18000
> #      Wed Jan 14 15:16:08 2015 -0500
> # Node ID 816f69e12902d419421bb148705f7bc65e0f7774
> # Parent  02d7b5cd373bbb4e8263dad9bfbf9c4c3b0e4e3a
> revlog: addgroup checks if incoming deltas add censored revs, sets flag bit
>
> A censored revision stored in a revlog should have the censored revlog index
> flag bit set. This implies we must know if a revision is censored before we
> add it to the revlog. When adding revisions from exchanged deltas, we would
> prefer to determine this flag without decoding every single full text.
>
> This change introduces a heuristic based on assumptions around the Mercurial
> delta format and filelog metadata. Since deltas which produce a censored
> revision must be full-replacement deltas, we can read the delta's first bytes
> to check the filelog metadata. Since "censored" is the alphabetically first
> filelog metadata key, censored filelog revisions have a well-known prefix we
> can look for.
>
> For more on the design and background of the censorship feature, see:
> http://mercurial.selenic.com/wiki/CensorPlan


This approach looks like like a recipe for disaster. (expecially the one 
relying on "censored" being the first entry by alphabetical order.

What do you miss in the protocol to makes it happen ? as far as I 
understand the censored feature requires bundle2 to be used. Bundle2 is 
not the default yet so you have a windows for protocol improvement.

Though?
adgar@google.com - March 10, 2015, 3:23 a.m.
On Mon, Mar 9, 2015 at 10:37 PM, Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 03/09/2015 12:56 PM, Mike Edgar wrote:
>
>> # HG changeset patch
>> # User Mike Edgar <adgar@google.com>
>> # Date 1421266568 18000
>> #      Wed Jan 14 15:16:08 2015 -0500
>> # Node ID 816f69e12902d419421bb148705f7bc65e0f7774
>> # Parent  02d7b5cd373bbb4e8263dad9bfbf9c4c3b0e4e3a
>> revlog: addgroup checks if incoming deltas add censored revs, sets flag
>> bit
>>
>> A censored revision stored in a revlog should have the censored revlog
>> index
>> flag bit set. This implies we must know if a revision is censored before
>> we
>> add it to the revlog. When adding revisions from exchanged deltas, we
>> would
>> prefer to determine this flag without decoding every single full text.
>>
>> This change introduces a heuristic based on assumptions around the
>> Mercurial
>> delta format and filelog metadata. Since deltas which produce a censored
>> revision must be full-replacement deltas, we can read the delta's first
>> bytes
>> to check the filelog metadata. Since "censored" is the alphabetically
>> first
>> filelog metadata key, censored filelog revisions have a well-known prefix
>> we
>> can look for.
>>
>> For more on the design and background of the censorship feature, see:
>> http://mercurial.selenic.com/wiki/CensorPlan
>>
>
>
> This approach looks like like a recipe for disaster. (expecially the one
> relying on "censored" being the first entry by alphabetical order.
>

Agreed it's suboptimal. Perhaps I could parse all the metadata,
line-by-line, instead? That might make this a precise approach and not just
a heuristic.


> What do you miss in the protocol to makes it happen ? as far as I
> understand the censored feature requires bundle2 to be used. Bundle2 is not
> the default yet so you have a windows for protocol improvement.
>

Bundle2 is not required, and in fact supporting older clients which don't
have bundle2 support is an explicit design goal. What's missing is
changegroups don't transfer the "flags" of a revision (since those flags
were unused until the censorship design).

A changegroup v3 could add a 16-bit flags field which obviates this
heuristic for new clients and servers, but supporting mercurial back to 1.x
is in scope for censorship.


> Though?
>
> --
> Pierre-Yves David
>
Pierre-Yves David - March 10, 2015, 4:01 a.m.
On 03/09/2015 08:23 PM, Michael Edgar wrote:
> On Mon, Mar 9, 2015 at 10:37 PM, Pierre-Yves David
> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
> wrote:
>
>
>
>     On 03/09/2015 12:56 PM, Mike Edgar wrote:
>
>         # HG changeset patch
>         # User Mike Edgar <adgar@google.com <mailto:adgar@google.com>>
>         # Date 1421266568 18000
>         #      Wed Jan 14 15:16:08 2015 -0500
>         # Node ID 816f69e12902d419421bb148705f7b__c65e0f7774
>         # Parent  02d7b5cd373bbb4e8263dad9bfbf9c__4c3b0e4e3a
>         revlog: addgroup checks if incoming deltas add censored revs,
>         sets flag bit
>
>         A censored revision stored in a revlog should have the censored
>         revlog index
>         flag bit set. This implies we must know if a revision is
>         censored before we
>         add it to the revlog. When adding revisions from exchanged
>         deltas, we would
>         prefer to determine this flag without decoding every single full
>         text.
>
>         This change introduces a heuristic based on assumptions around
>         the Mercurial
>         delta format and filelog metadata. Since deltas which produce a
>         censored
>         revision must be full-replacement deltas, we can read the
>         delta's first bytes
>         to check the filelog metadata. Since "censored" is the
>         alphabetically first
>         filelog metadata key, censored filelog revisions have a
>         well-known prefix we
>         can look for.
>
>         For more on the design and background of the censorship feature,
>         see:
>         http://mercurial.selenic.com/__wiki/CensorPlan
>         <http://mercurial.selenic.com/wiki/CensorPlan>
>
>
>
>     This approach looks like like a recipe for disaster. (expecially the
>     one relying on "censored" being the first entry by alphabetical order.
>
>
> Agreed it's suboptimal. Perhaps I could parse all the metadata,
> line-by-line, instead? That might make this a precise approach and not
> just a heuristic.

The final approach needs to be resistent to metadata extension. parsing 
all of them seems to fit this needs.

>     What do you miss in the protocol to makes it happen ? as far as I
>     understand the censored feature requires bundle2 to be used. Bundle2
>     is not the default yet so you have a windows for protocol improvement.
>
>
> Bundle2 is not required, and in fact supporting older clients which
> don't have bundle2 support is an explicit design goal.

Really do you have a pointer to something explaining how the feature 
works for older client (I'm curious now). And I'm a bit surprised that 
-you- care about older client.

> What's missing is
> changegroups don't transfer the "flags" of a revision (since those flags
> were unused until the censorship design).
>
> A changegroup v3 could add a 16-bit flags field which obviates this
> heuristic for new clients and servers,

If you can give a look at such changegroup v3, now is the right time.

> but supporting mercurial back to 1.x is in scope for censorship.

<puzzled look>

We should probably have and IRC/VC chat.
Matt Mackall - March 10, 2015, 10:05 p.m.
On Mon, 2015-03-09 at 15:56 -0400, Mike Edgar wrote:
> # HG changeset patch
> # User Mike Edgar <adgar@google.com>
> # Date 1421266568 18000
> #      Wed Jan 14 15:16:08 2015 -0500
> # Node ID 816f69e12902d419421bb148705f7bc65e0f7774
> # Parent  02d7b5cd373bbb4e8263dad9bfbf9c4c3b0e4e3a
> revlog: addgroup checks if incoming deltas add censored revs, sets flag bit

Queued for default, thanks.

(Pierre-Yves' concerns were discussed on IRC.)

Patch

diff -r 02d7b5cd373b -r 816f69e12902 mercurial/filelog.py
--- a/mercurial/filelog.py	Tue Feb 10 15:59:12 2015 -0500
+++ b/mercurial/filelog.py	Wed Jan 14 15:16:08 2015 -0500
@@ -5,8 +5,8 @@ 
 # This software may be used and distributed according to the terms of the
 # GNU General Public License version 2 or any later version.
 
-import error, revlog
-import re
+import error, mdiff, revlog
+import re, struct
 
 _mdre = re.compile('\1\n')
 def parsemeta(text):
@@ -107,3 +107,23 @@ 
     def iscensored(self, rev):
         """Check if a file revision is censored."""
         return self.flags(rev) & revlog.REVIDX_ISCENSORED
+
+    def _peek_iscensored(self, baserev, delta, flush):
+        """Quickly check if a delta produces a censored revision."""
+        # Fragile heuristic: unless new file meta keys are added alphabetically
+        # preceding "censored", all censored revisions are prefixed by
+        # "\1\ncensored:". A delta producing such a censored revision must be a
+        # full-replacement delta, so we inspect the first and only patch in the
+        # delta for this prefix.
+        hlen = struct.calcsize(">lll")
+        if len(delta) <= hlen:
+            return False
+
+        oldlen = self.rawsize(baserev)
+        newlen = len(delta) - hlen
+        if delta[:hlen] != mdiff.replacediffheader(oldlen, newlen):
+            return False
+
+        add = "\1\ncensored:"
+        addlen = len(add)
+        return newlen >= addlen and delta[hlen:hlen + addlen] == add
diff -r 02d7b5cd373b -r 816f69e12902 mercurial/revlog.py
--- a/mercurial/revlog.py	Tue Feb 10 15:59:12 2015 -0500
+++ b/mercurial/revlog.py	Wed Jan 14 15:16:08 2015 -0500
@@ -1386,7 +1386,10 @@ 
             transaction.add(self.indexfile, isize, r)
             transaction.add(self.datafile, end)
             dfh = self.opener(self.datafile, "a")
-
+        def flush():
+            if dfh:
+                dfh.flush()
+            ifh.flush()
         try:
             # loop through our set of deltas
             chain = None
@@ -1430,9 +1433,13 @@ 
                         raise error.CensoredBaseError(self.indexfile,
                                                       self.node(baserev))
 
+                flags = REVIDX_DEFAULT_FLAGS
+                if self._peek_iscensored(baserev, delta, flush):
+                    flags |= REVIDX_ISCENSORED
+
                 chain = self._addrevision(node, None, transaction, link,
-                                          p1, p2, REVIDX_DEFAULT_FLAGS,
-                                          (baserev, delta), ifh, dfh)
+                                          p1, p2, flags, (baserev, delta),
+                                          ifh, dfh)
                 if not dfh and not self._inline:
                     # addrevision switched from inline to conventional
                     # reopen the index
@@ -1450,6 +1457,10 @@ 
         """Check if a file revision is censored."""
         return False
 
+    def _peek_iscensored(self, baserev, delta, flush):
+        """Quickly check if a delta produces a censored revision."""
+        return False
+
     def getstrippoint(self, minlink):
         """find the minimum rev that must be stripped to strip the linkrev