Patchwork D1940: patch: avoid repeated binary checks if all files in a patch are text

login
register
mail settings
Submitter phabricator
Date Jan. 25, 2018, 10:36 p.m.
Message ID <differential-rev-PHID-DREV-ay2g2pegghz4zztzemr6-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/27063/
State Superseded
Headers show

Comments

phabricator - Jan. 25, 2018, 10:36 p.m.
joerg.sonnenberger created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1940

AFFECTED FILES
  mercurial/mdiff.py
  mercurial/patch.py

CHANGE DETAILS




To: joerg.sonnenberger, #hg-reviewers
Cc: mercurial-devel
phabricator - Jan. 26, 2018, 10:53 a.m.
lothiraldan added a comment.


  Do we have some numbers about how many seconds we win on big diffs?

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1940

To: joerg.sonnenberger, #hg-reviewers
Cc: lothiraldan, mercurial-devel
phabricator - Jan. 26, 2018, 12:20 p.m.
joerg.sonnenberger added a comment.


  I haven't seen a performance difference for regular runs, but it does show up in profiles, so eliminating it avoids wasting mental power by developers down the line.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1940

To: joerg.sonnenberger, #hg-reviewers
Cc: lothiraldan, mercurial-devel

Patch

diff --git a/mercurial/patch.py b/mercurial/patch.py
--- a/mercurial/patch.py
+++ b/mercurial/patch.py
@@ -2698,8 +2698,13 @@ 
             if opts.git or losedatafn:
                 flag2 = ctx2.flags(f2)
         # if binary is True, output "summary" or "base85", but not "text diff"
-        binary = not opts.text and any(f.isbinary()
-                                       for f in [fctx1, fctx2] if f is not None)
+        if opts.text:
+            check_binary = True
+            binary = False
+        else:
+            check_binary = any(f.isbinary()
+                               for f in [fctx1, fctx2] if f is not None)
+            binary = check_binary
 
         if losedatafn and not opts.git:
             if (binary or
@@ -2789,7 +2794,8 @@ 
 
             uheaders, hunks = mdiff.unidiff(content1, date1,
                                             content2, date2,
-                                            path1, path2, opts=opts)
+                                            path1, path2, opts=opts,
+                                            check_binary=check_binary)
             header.extend(uheaders)
         yield fctx1, fctx2, header, hunks
 
diff --git a/mercurial/mdiff.py b/mercurial/mdiff.py
--- a/mercurial/mdiff.py
+++ b/mercurial/mdiff.py
@@ -234,13 +234,16 @@ 
             yield s, type
         yield s1, '='
 
-def unidiff(a, ad, b, bd, fn1, fn2, opts=defaultopts):
+def unidiff(a, ad, b, bd, fn1, fn2, opts=defaultopts, check_binary=True):
     """Return a unified diff as a (headers, hunks) tuple.
 
     If the diff is not null, `headers` is a list with unified diff header
     lines "--- <original>" and "+++ <new>" and `hunks` is a generator yielding
     (hunkrange, hunklines) coming from _unidiff().
     Otherwise, `headers` and `hunks` are empty.
+
+    Setting `check_binary` to false will skip the binary check, i.e. when
+    it has been done in advance. Files are expected to be text in this case.
     """
     def datetag(date, fn=None):
         if not opts.git and not opts.nodates:
@@ -270,7 +273,7 @@ 
                 text += "\n\ No newline at end of file\n"
             yield text
 
-    if not opts.text and (util.binary(a) or util.binary(b)):
+    if not opts.text and check_binary and (util.binary(a) or util.binary(b)):
         if a and b and len(a) == len(b) and a == b:
             return sentinel
         headerlines = []