Patchwork [3,of,3] diff: add a fast path to avoid loading binary contents

login
register
mail settings
Submitter Jun Wu
Date May 4, 2017, 7 a.m.
Message ID <d3b641866aa3ee2ff36a.1493881215@x1c>
Download mbox | patch
Permalink /patch/20426/
State Changes Requested
Headers show

Comments

Jun Wu - May 4, 2017, 7 a.m.
# HG changeset patch
# User Jun Wu <quark@fb.com>
# Date 1493880641 25200
#      Wed May 03 23:50:41 2017 -0700
# Node ID d3b641866aa3ee2ff36a62b6e05f9e8849ef1ae6
# Parent  12370108647046bd6ec1af8365ee908746a135f3
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r d3b641866aa3
diff: add a fast path to avoid loading binary contents

When diffing binary contents, with certain configs, we can show
"Binary file <name> has changed" without actual content.

That allows a fast path where we could avoid providing actual binary
contents. Note: in that case we still need to test if two contents are the
same, that's done by using "filectx.cmp", which could have its own fast
path.
Yuya Nishihara - May 5, 2017, 2:26 a.m.
On Thu, 4 May 2017 00:00:15 -0700, Jun Wu wrote:
> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1493880641 25200
> #      Wed May 03 23:50:41 2017 -0700
> # Node ID d3b641866aa3ee2ff36a62b6e05f9e8849ef1ae6
> # Parent  12370108647046bd6ec1af8365ee908746a135f3
> # Available At https://bitbucket.org/quark-zju/hg-draft
> #              hg pull https://bitbucket.org/quark-zju/hg-draft -r d3b641866aa3
> diff: add a fast path to avoid loading binary contents
> 
> When diffing binary contents, with certain configs, we can show
> "Binary file <name> has changed" without actual content.
> 
> That allows a fast path where we could avoid providing actual binary
> contents. Note: in that case we still need to test if two contents are the
> same, that's done by using "filectx.cmp", which could have its own fast
> path.
> 
> diff --git a/mercurial/patch.py b/mercurial/patch.py
> --- a/mercurial/patch.py
> +++ b/mercurial/patch.py
> @@ -2533,10 +2533,8 @@ def trydiff(repo, revs, ctx1, ctx2, modi
>          if f1:
>              fctx1 = getfilectx(f1, ctx1)
> -            content1 = fctx1.data()
>              if opts.git or losedatafn:
>                  flag1 = ctx1.flags(f1)
>          if f2:
>              fctx2 = getfilectx(f2, ctx2)
> -            content2 = fctx2.data()
>              if opts.git or losedatafn:
>                  flag2 = ctx2.flags(f2)
> @@ -2585,4 +2583,22 @@ def trydiff(repo, revs, ctx1, ctx2, modi
>              header.append(diffline(path1, revs))
>  
> +        if binary and not opts.text and (opts.nobinary or not opts.git):

Real contents are necessary if 'opts.git and opts.index > 0' (even if
opts.nobinary.)

> +            # fast path: no binary content will be displayed, content1 and
> +            # content2 are only used for equivalent test. cmp() could have a
> +            # fast path.
> +            if fctx1 is not None:
> +                content1 = b'\0'
> +            if fctx2 is not None:
> +                if fctx1 is not None and not fctx1.cmp(fctx2):
> +                    content2 = b'\0' # not different
> +                else:
> +                    content2 = b'\0\0'
> +        else:
> +            # normal path: load contents
> +            if fctx1 is not None:
> +                content1 = fctx1.data()
> +            if fctx2 is not None:
> +                content2 = fctx2.data()
> +
>          if binary and opts.git and not opts.nobinary and not opts.text:
>              text = mdiff.b85diff(content1, content2)
Jun Wu - May 5, 2017, 4:44 a.m.
Excerpts from Yuya Nishihara's message of 2017-05-05 11:26:23 +0900:
> [...]
> > +        if binary and not opts.text and (opts.nobinary or not opts.git):
> 
> Real contents are necessary if 'opts.git and opts.index > 0' (even if
> opts.nobinary.)

Good catch. I was using "not opts.git" and later became greedy and added
"or opts.nobinary" and became wrong.  I'll send V2.

Patch

diff --git a/mercurial/patch.py b/mercurial/patch.py
--- a/mercurial/patch.py
+++ b/mercurial/patch.py
@@ -2533,10 +2533,8 @@  def trydiff(repo, revs, ctx1, ctx2, modi
         if f1:
             fctx1 = getfilectx(f1, ctx1)
-            content1 = fctx1.data()
             if opts.git or losedatafn:
                 flag1 = ctx1.flags(f1)
         if f2:
             fctx2 = getfilectx(f2, ctx2)
-            content2 = fctx2.data()
             if opts.git or losedatafn:
                 flag2 = ctx2.flags(f2)
@@ -2585,4 +2583,22 @@  def trydiff(repo, revs, ctx1, ctx2, modi
             header.append(diffline(path1, revs))
 
+        if binary and not opts.text and (opts.nobinary or not opts.git):
+            # fast path: no binary content will be displayed, content1 and
+            # content2 are only used for equivalent test. cmp() could have a
+            # fast path.
+            if fctx1 is not None:
+                content1 = b'\0'
+            if fctx2 is not None:
+                if fctx1 is not None and not fctx1.cmp(fctx2):
+                    content2 = b'\0' # not different
+                else:
+                    content2 = b'\0\0'
+        else:
+            # normal path: load contents
+            if fctx1 is not None:
+                content1 = fctx1.data()
+            if fctx2 is not None:
+                content2 = fctx2.data()
+
         if binary and opts.git and not opts.nobinary and not opts.text:
             text = mdiff.b85diff(content1, content2)