Patchwork D2577: mdiff: prefer xdiff for diff calculation

login
register
mail settings
Submitter phabricator
Date March 3, 2018, 12:25 a.m.
Message ID <differential-rev-PHID-DREV-bhkgmejyhy3wulcamx5o-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/28760/
State Superseded
Headers show

Comments

phabricator - March 3, 2018, 12:25 a.m.
ryanmce created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Let's switch to xdiff for its better diff quality and faster performance!
  
  bdiff is still used as a fallback for cases xdiff isn't built, or the pure
  Python version.

TEST PLAN
  Added the "patience" test case mentioned in previous commit. It fails
  with a huge diff output before this change.
  
  I expected some annotate/diff output changes due to diff blocks being
  shifted around. Let's see what sandcastle says. I'll make adjustments
  accordingly.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2577

AFFECTED FILES
  mercurial/mdiff.py
  tests/hghave.py
  tests/test-diff-antipatience.t

CHANGE DETAILS




To: ryanmce, #hg-reviewers
Cc: mercurial-devel
phabricator - March 3, 2018, 3:10 p.m.
durin42 added inline comments.

INLINE COMMENTS

> mdiff.py:34
>  
> +try:
> +    from .cext import xdiff

Rather than this, let's try to get the xdiffblocks function (and if it's not there set it to none), and then we could rework the below code to use xdiffblocks instead of blocks if experimental.xdiff is set to true - that'll make it easier to do some testing of both codepaths.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2577

To: ryanmce, #hg-reviewers
Cc: durin42, mercurial-devel
phabricator - March 3, 2018, 5:35 p.m.
quark added inline comments.

INLINE COMMENTS

> durin42 wrote in mdiff.py:34
> Rather than this, let's try to get the xdiffblocks function (and if it's not there set it to none), and then we could rework the below code to use xdiffblocks instead of blocks if experimental.xdiff is set to true - that'll make it easier to do some testing of both codepaths.

The diff functions below do not have an `ui` object. Would you be okay if the config is global? ex. something like

  # mdiff.py
  def setdiffalgo(name):
      global blocks
      if name == 'xdiff':
          blocks = ...
      else: ...



  # dispatch.py
      ui = ....
      mdiff.setdiffalgo(...)

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2577

To: ryanmce, #hg-reviewers
Cc: quark, durin42, mercurial-devel
phabricator - March 3, 2018, 7:14 p.m.
ryanmce added inline comments.

INLINE COMMENTS

> quark wrote in mdiff.py:34
> The diff functions below do not have an `ui` object. Would you be okay if the config is global? ex. something like
> 
>   # mdiff.py
>   def setdiffalgo(name):
>       global blocks
>       if name == 'xdiff':
>           blocks = ...
>       else: ...
> 
> 
> 
>   # dispatch.py
>       ui = ....
>       mdiff.setdiffalgo(...)

I chatted with people here at the sprint. I'm going to see if we can pass in the diff algo in from calling functions, falling back to bdiff if none is specified.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2577

To: ryanmce, #hg-reviewers
Cc: quark, durin42, mercurial-devel
phabricator - March 3, 2018, 7:19 p.m.
indygreg added inline comments.

INLINE COMMENTS

> quark wrote in mdiff.py:34
> The diff functions below do not have an `ui` object. Would you be okay if the config is global? ex. something like
> 
>   # mdiff.py
>   def setdiffalgo(name):
>       global blocks
>       if name == 'xdiff':
>           blocks = ...
>       else: ...
> 
> 
> 
>   # dispatch.py
>       ui = ....
>       mdiff.setdiffalgo(...)

global state is evil. Especially on functionality that calls out to C and may release the GIL. If we have multi-threaded applications, things could get in a very bad state.

I'd encourage you to plumb in the diff algorithm to use into the diffopts. Even if the diff algorithm is only visible for user-visible diffs (as opposed to say storage diffs), that's a win.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2577

To: ryanmce, #hg-reviewers
Cc: indygreg, quark, durin42, mercurial-devel
phabricator - March 3, 2018, 8:10 p.m.
quark commandeered this revision.
quark added a reviewer: ryanmce.
quark added a comment.


  Took the diffopts approach - note: merge conflicts improvements are gone since `mdiff.get_matching_blocks` is not covered.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2577

To: quark, #hg-reviewers, ryanmce
Cc: indygreg, quark, durin42, mercurial-devel

Patch

diff --git a/tests/test-diff-antipatience.t b/tests/test-diff-antipatience.t
new file mode 100644
--- /dev/null
+++ b/tests/test-diff-antipatience.t
@@ -0,0 +1,41 @@ 
+#require xdiff
+
+Test case that makes use of the weakness of patience diff algorithm
+
+  $ hg init
+  >>> open('a', 'w').write('\n'.join(list('a' + 'x' * 300 + 'u' + 'x' * 700 + 'a\n')))
+  $ hg commit -m 1 -A a
+  >>> open('a', 'w').write('\n'.join(list('b' + 'x' * 700 + 'u' + 'x' * 300 + 'b\n')))
+  $ hg diff
+  diff -r 6c45a8fe8cb6 a
+  --- a/a	Thu Jan 01 00:00:00 1970 +0000
+  +++ b/a	Thu Jan 01 00:00:00 1970 +0000
+  @@ -1,4 +1,4 @@
+  -a
+  +b
+   x
+   x
+   x
+  @@ -299,7 +299,6 @@
+   x
+   x
+   x
+  -u
+   x
+   x
+   x
+  @@ -700,6 +699,7 @@
+   x
+   x
+   x
+  +u
+   x
+   x
+   x
+  @@ -1000,5 +1000,5 @@
+   x
+   x
+   x
+  -a
+  +b
+   
diff --git a/tests/hghave.py b/tests/hghave.py
--- a/tests/hghave.py
+++ b/tests/hghave.py
@@ -708,3 +708,11 @@ 
         # libfuzzer is new in clang 6
         return int(mat.group(1)) > 5
     return False
+
+@check("xdiff", "xdiff algorithm")
+def has_xdiff():
+    try:
+        from mercurial.cext import xdiff
+        return xdiff.blocks('', '') == [(0, 0, 0, 0)]
+    except ImportError:
+        return False
diff --git a/mercurial/mdiff.py b/mercurial/mdiff.py
--- a/mercurial/mdiff.py
+++ b/mercurial/mdiff.py
@@ -25,13 +25,18 @@ 
 bdiff = policy.importmod(r'bdiff')
 mpatch = policy.importmod(r'mpatch')
 
-blocks = bdiff.blocks
 fixws = bdiff.fixws
 patches = mpatch.patches
 patchedsize = mpatch.patchedsize
 textdiff = bdiff.bdiff
 splitnewlines = bdiff.splitnewlines
 
+try:
+    from .cext import xdiff
+    blocks = xdiff.blocks
+except ImportError:
+    blocks = bdiff.blocks
+
 class diffopts(object):
     '''context is the number of context lines
     text treats all files as text
@@ -200,7 +205,7 @@ 
     if opts.ignorews or opts.ignorewsamount or opts.ignorewseol:
         text1 = wsclean(opts, text1, False)
         text2 = wsclean(opts, text2, False)
-    diff = bdiff.blocks(text1, text2)
+    diff = blocks(text1, text2)
     for i, s1 in enumerate(diff):
         # The first match is special.
         # we've either found a match starting at line 0 or a match later
@@ -508,7 +513,7 @@ 
 
 # similar to difflib.SequenceMatcher.get_matching_blocks
 def get_matching_blocks(a, b):
-    return [(d[0], d[2], d[1] - d[0]) for d in bdiff.blocks(a, b)]
+    return [(d[0], d[2], d[1] - d[0]) for d in blocks(a, b)]
 
 def trivialdiffheader(length):
     return struct.pack(">lll", 0, 0, length) if length else ''