Patchwork [1,of,2,censor,RFC] revlog: special case expanding full-replacement deltas received by exchange

login
register
mail settings
Submitter adgar@google.com
Date Feb. 10, 2015, 9:39 p.m.
Message ID <78f2d6b8fe46e1eddea5.1423604384@adgar.nyc.corp.google.com>
Download mbox | patch
Permalink /patch/7777/
State Superseded
Commit 155a0643986382be251507222cb45ded9e7107d3
Headers show

Comments

adgar@google.com - Feb. 10, 2015, 9:39 p.m.
# HG changeset patch
# User Mike Edgar <adgar@google.com>
# Date 1423186696 0
#      Fri Feb 06 01:38:16 2015 +0000
# Node ID 78f2d6b8fe46e1eddea5758434dbb7d758ae1a1b
# Parent  521c330b1ad7af3e9da78750defc534ab981c1ff
revlog: special case expanding full-replacement deltas received by exchange

When a delta received through exchange is added to a revlog, it will very
often be expanded to a full text by applying the delta to its base. If
that delta is of a particular form, we can avoid decoding the base revision.
This avoids an exception if the base revision is censored.

For background and broader design of the censorship feature, see:
http://mercurial.selenic.com/wiki/CensorPlan
Martin von Zweigbergk - Feb. 10, 2015, 10:02 p.m.
On Tue Feb 10 2015 at 1:40:34 PM Mike Edgar <adgar@google.com> wrote:

> +            # special case deltas which replace entire base; no need to
> decode
> +            # base revision, which is cheaper and neatly avoids censored
> bases.


Is it noticeably cheaper? Would it be most useful on binary content? And
least useful on big files with small changes? I'm asking mostly out of
curiosity; I'm not really worried about the cost.
adgar@google.com - Feb. 10, 2015, 11:14 p.m.
On Tue, Feb 10, 2015 at 5:02 PM, Martin von Zweigbergk <
martinvonz@google.com> wrote:

>
>
> On Tue Feb 10 2015 at 1:40:34 PM Mike Edgar <adgar@google.com> wrote:
>
>> +            # special case deltas which replace entire base; no need to
>> decode
>> +            # base revision, which is cheaper and neatly avoids censored
>> bases.
>
>
> Is it noticeably cheaper? Would it be most useful on binary content? And
> least useful on big files with small changes? I'm asking mostly out of
> curiosity; I'm not really worried about the cost.
>
>
I'm actually going to remove that "cheaper" bit from the comment, because
the point of the change is solely to support censored base revisions. And
in so doing, dodge answering a question to which I don't have the answer :)

Patch

diff -r 521c330b1ad7 -r 78f2d6b8fe46 mercurial/revlog.py
--- a/mercurial/revlog.py	Fri Feb 06 00:55:29 2015 +0000
+++ b/mercurial/revlog.py	Fri Feb 06 01:38:16 2015 +0000
@@ -1233,8 +1233,17 @@ 
             if dfh:
                 dfh.flush()
             ifh.flush()
-            basetext = self.revision(self.node(cachedelta[0]))
-            btext[0] = mdiff.patch(basetext, cachedelta[1])
+            baserev = cachedelta[0]
+            delta = cachedelta[1]
+            # special case deltas which replace entire base; no need to decode
+            # base revision, which is cheaper and neatly avoids censored bases.
+            hlen = struct.calcsize(">lll")
+            if delta[:hlen] == mdiff.replacediffheader(self.rawsize(baserev),
+                                                       len(delta) - hlen):
+                btext[0] = delta[hlen:]
+            else:
+                basetext = self.revision(self.node(baserev))
+                btext[0] = mdiff.patch(basetext, delta)
             try:
                 self.checkhash(btext[0], p1, p2, node)
                 if flags & REVIDX_ISCENSORED: