Patchwork [v6] cmdutil: add within-line color diff capacity

login
register
mail settings
Submitter matthieu.laneuville@octobus.net
Date Dec. 2, 2017, 12:51 p.m.
Message ID <2cd53502aa3a206eed18.1512219093@carbon>
Download mbox | patch
Permalink /patch/25898/
State Accepted
Headers show

Comments

matthieu.laneuville@octobus.net - Dec. 2, 2017, 12:51 p.m.
# HG changeset patch
# User Matthieu Laneuville <matthieu.laneuville@octobus.net>
# Date 1508944418 -32400
#      Thu Oct 26 00:13:38 2017 +0900
# Node ID 2cd53502aa3a206eed18d1c13ee294b418fb43c5
# Parent  3da4bd50103daab5419fc796d0c62470bab6da03
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 2cd53502aa3a
# EXP-Topic inline-diff
cmdutil: add within-line color diff capacity

The `diff' command usually writes deletion in red and insertions in green. This
patch adds within-line colors, to highlight which part of the lines differ.
Lines to compare are decided based on their similarity ratio, as computed by
difflib SequenceMatcher, with an arbitrary threshold (0.7) to decide at which
point two lines are considered entirely different (therefore no inline-diff
required).

The current implementation is kept behind an experimental flag in order to test
the effect on performance. In order to activate it, set inline-color-diff to
true in [experimental].
Yuya Nishihara - Dec. 3, 2017, 8:30 a.m.
On Sat, 02 Dec 2017 21:51:33 +0900, matthieu.laneuville@octobus.net wrote:
> # HG changeset patch
> # User Matthieu Laneuville <matthieu.laneuville@octobus.net>
> # Date 1508944418 -32400
> #      Thu Oct 26 00:13:38 2017 +0900
> # Node ID 2cd53502aa3a206eed18d1c13ee294b418fb43c5
> # Parent  3da4bd50103daab5419fc796d0c62470bab6da03
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 2cd53502aa3a
> # EXP-Topic inline-diff
> cmdutil: add within-line color diff capacity

Looks nicer than the previous patch, but found a couple of bugs.

> diff -r 3da4bd50103d -r 2cd53502aa3a mercurial/patch.py
> --- a/mercurial/patch.py	Wed Nov 29 07:57:17 2017 +0530
> +++ b/mercurial/patch.py	Thu Oct 26 00:13:38 2017 +0900
> @@ -10,6 +10,7 @@ from __future__ import absolute_import, 
>  
>  import collections
>  import copy
> +import difflib
>  import email
>  import errno
>  import hashlib
> @@ -2461,6 +2462,12 @@ def diffhunks(repo, node1=None, node2=No
>  
>  def difflabel(func, *args, **kw):
>      '''yields 2-tuples of (output, label) based on the output of func()'''
> +    inlinecolor = False
> +    for arg in args:
> +        if util.safehasattr(arg, 'ui'):
> +            inlinecolor = arg.ui.configbool("experimental", "inline-color-diff")
> +        break

This is ugly. Perhaps the worddiff option can be carried by the opts
argument. See patch.difffeatureopts() for details.

> @@ -2477,6 +2484,7 @@ def difflabel(func, *args, **kw):
>      head = False
>      for chunk in func(*args, **kw):
>          lines = chunk.split('\n')
> +        matches = [-1 for x in range(len(lines) + 1)]
>          for i, line in enumerate(lines):
>              if i != 0:
>                  yield ('\n', '')
> @@ -2504,7 +2512,12 @@ def difflabel(func, *args, **kw):
>                              if '\t' == token[0]:
>                                  yield (token, 'diff.tab')
>                              else:
> -                                yield (token, label)
> +                                if not inlinecolor:
> +                                    yield (token, label)
> +                                else:
> +                                    buff, matches = _worddiff(lines, i, matches)
> +                                    for (color, word) in buff:
> +                                        yield (word, color)
>                      else:
>                          yield (stripline, label)
>                      break
> @@ -2513,6 +2526,63 @@ def difflabel(func, *args, **kw):
>              if line != stripline:
>                  yield (line[len(stripline):], 'diff.trailingwhitespace')
>  
> +def _worddiff(slist, idx, matches):
> +    '''Find match of a given string in current chunk and performs word diff.'''
> +    operation = 'inserted' if slist[idx][0] == '+' else 'deleted'
> +    bound = matches[-1] # last item in matches stores the id of the last match
> +
> +    # inserted lines should only be compared to lines that matched them before
> +    if operation == 'inserted':
> +        if matches[idx] != -1:
> +            return _inlinediff(slist[idx],
> +                                    slist[matches[idx]],
> +                                    operation), matches
> +        else:
> +            return [('diff.' + operation, slist[idx])], matches

a) Needs to return a stripline, not a line.

> +    # deleted lines first need to be matched
> +    for i, line in enumerate(slist[bound + 1:-1]):
> +        if line == '':
> +            continue
> +        if line[0] == '+':
> +            sim = difflib.SequenceMatcher(None, slist[idx], line).ratio()
> +            if sim > 0.7:
> +                matches[i + bound + 1] = idx
> +                matches[-1] = i + bound + 1
> +                return _inlinediff(slist[idx], line, operation), matches
> +    return [('diff.' + operation, slist[idx])], matches

b) We need to limit the search space only to the next inserted lines.

Perhaps (a) and (b) can be simply fixed by 2-pass scanning, in which build
a map of matched lines by 1st pass, and apply _inlinediff() by 2nd pass.

  lines = chunk.split('\n')
  matches = {}
  if inlinecolor:
      # build {deleted/inserted: inserted/deleted} map by looking @, -, + lines
      matches = _findmatches(lines)
  for i, line in enumerate(lines):
      ...
          elif i in matches:
              for w, l in _inlinediff(token, strip(lines[matches[i]])):
                  yield (w, l)
          else:
              yield (stripline, label)

> +def _inlinediff(s1, s2, operation):
> +    '''Perform string diff to highlight specific changes.'''
> +    operation_skip = '+?' if operation == 'deleted' else '-?'
> +    if operation == 'deleted':
> +        s2, s1 = s1, s2
> +
> +    # buffer required to remove last space, there may be smarter ways to do this
> +    buff = []
> +
> +    # we never want to higlight the leading +-
> +    if operation == 'deleted' and s2.startswith('-'):
> +        buff.append(('diff.deleted', '-'))
> +        s2 = s2[1:]
> +        s1 = s1[1:]
> +    elif operation == 'inserted' and s1.startswith('+'):
> +        buff.append(('diff.inserted', '+'))
> +        s2 = s2[1:]
> +        s1 = s1[1:]
> +
> +    s = difflib.ndiff(re.split(r'(\W)', s2), re.split(r'(\W)', s1))
                                  ^^

Nit: please use br'' for Python 3 compatibility.

> +    for line in s:
           ^^^^

Nit: the variable name "line" is confusing.

> +        if line[0] in operation_skip:
> +            continue
> +        l = 'diff.' + operation + '.highlight'
> +        if line[0] in ' ': # unchanged parts
> +            l = 'diff.' + operation
> +        buff.append((l, line[2:]))
> +
> +    buff[-1] = (buff[-1][0], buff[-1][1].strip(' '))
> +    return buff

Maybe we should compact contiguous words of the same label, but that can be
addressed by a follow-up patch.

Patch

diff -r 3da4bd50103d -r 2cd53502aa3a mercurial/color.py
--- a/mercurial/color.py	Wed Nov 29 07:57:17 2017 +0530
+++ b/mercurial/color.py	Thu Oct 26 00:13:38 2017 +0900
@@ -87,12 +87,14 @@  except ImportError:
     'branches.inactive': 'none',
     'diff.changed': 'white',
     'diff.deleted': 'red',
+    'diff.deleted.highlight': 'red bold underline',
     'diff.diffline': 'bold',
     'diff.extended': 'cyan bold',
     'diff.file_a': 'red bold',
     'diff.file_b': 'green bold',
     'diff.hunk': 'magenta',
     'diff.inserted': 'green',
+    'diff.inserted.highlight': 'green bold underline',
     'diff.tab': '',
     'diff.trailingwhitespace': 'bold red_background',
     'changeset.public': '',
diff -r 3da4bd50103d -r 2cd53502aa3a mercurial/configitems.py
--- a/mercurial/configitems.py	Wed Nov 29 07:57:17 2017 +0530
+++ b/mercurial/configitems.py	Thu Oct 26 00:13:38 2017 +0900
@@ -388,6 +388,9 @@  coreconfigitem('experimental', 'evolutio
 coreconfigitem('experimental', 'evolution.track-operation',
     default=True,
 )
+coreconfigitem('experimental', 'inline-color-diff',
+    default=False,
+)
 coreconfigitem('experimental', 'maxdeltachainspan',
     default=-1,
 )
diff -r 3da4bd50103d -r 2cd53502aa3a mercurial/patch.py
--- a/mercurial/patch.py	Wed Nov 29 07:57:17 2017 +0530
+++ b/mercurial/patch.py	Thu Oct 26 00:13:38 2017 +0900
@@ -10,6 +10,7 @@  from __future__ import absolute_import, 
 
 import collections
 import copy
+import difflib
 import email
 import errno
 import hashlib
@@ -2461,6 +2462,12 @@  def diffhunks(repo, node1=None, node2=No
 
 def difflabel(func, *args, **kw):
     '''yields 2-tuples of (output, label) based on the output of func()'''
+    inlinecolor = False
+    for arg in args:
+        if util.safehasattr(arg, 'ui'):
+            inlinecolor = arg.ui.configbool("experimental", "inline-color-diff")
+        break
+
     headprefixes = [('diff', 'diff.diffline'),
                     ('copy', 'diff.extended'),
                     ('rename', 'diff.extended'),
@@ -2477,6 +2484,7 @@  def difflabel(func, *args, **kw):
     head = False
     for chunk in func(*args, **kw):
         lines = chunk.split('\n')
+        matches = [-1 for x in range(len(lines) + 1)]
         for i, line in enumerate(lines):
             if i != 0:
                 yield ('\n', '')
@@ -2504,7 +2512,12 @@  def difflabel(func, *args, **kw):
                             if '\t' == token[0]:
                                 yield (token, 'diff.tab')
                             else:
-                                yield (token, label)
+                                if not inlinecolor:
+                                    yield (token, label)
+                                else:
+                                    buff, matches = _worddiff(lines, i, matches)
+                                    for (color, word) in buff:
+                                        yield (word, color)
                     else:
                         yield (stripline, label)
                     break
@@ -2513,6 +2526,63 @@  def difflabel(func, *args, **kw):
             if line != stripline:
                 yield (line[len(stripline):], 'diff.trailingwhitespace')
 
+def _worddiff(slist, idx, matches):
+    '''Find match of a given string in current chunk and performs word diff.'''
+    operation = 'inserted' if slist[idx][0] == '+' else 'deleted'
+    bound = matches[-1] # last item in matches stores the id of the last match
+
+    # inserted lines should only be compared to lines that matched them before
+    if operation == 'inserted':
+        if matches[idx] != -1:
+            return _inlinediff(slist[idx],
+                                    slist[matches[idx]],
+                                    operation), matches
+        else:
+            return [('diff.' + operation, slist[idx])], matches
+
+    # deleted lines first need to be matched
+    for i, line in enumerate(slist[bound + 1:-1]):
+        if line == '':
+            continue
+        if line[0] == '+':
+            sim = difflib.SequenceMatcher(None, slist[idx], line).ratio()
+            if sim > 0.7:
+                matches[i + bound + 1] = idx
+                matches[-1] = i + bound + 1
+                return _inlinediff(slist[idx], line, operation), matches
+    return [('diff.' + operation, slist[idx])], matches
+
+def _inlinediff(s1, s2, operation):
+    '''Perform string diff to highlight specific changes.'''
+    operation_skip = '+?' if operation == 'deleted' else '-?'
+    if operation == 'deleted':
+        s2, s1 = s1, s2
+
+    # buffer required to remove last space, there may be smarter ways to do this
+    buff = []
+
+    # we never want to higlight the leading +-
+    if operation == 'deleted' and s2.startswith('-'):
+        buff.append(('diff.deleted', '-'))
+        s2 = s2[1:]
+        s1 = s1[1:]
+    elif operation == 'inserted' and s1.startswith('+'):
+        buff.append(('diff.inserted', '+'))
+        s2 = s2[1:]
+        s1 = s1[1:]
+
+    s = difflib.ndiff(re.split(r'(\W)', s2), re.split(r'(\W)', s1))
+    for line in s:
+        if line[0] in operation_skip:
+            continue
+        l = 'diff.' + operation + '.highlight'
+        if line[0] in ' ': # unchanged parts
+            l = 'diff.' + operation
+        buff.append((l, line[2:]))
+
+    buff[-1] = (buff[-1][0], buff[-1][1].strip(' '))
+    return buff
+
 def diffui(*args, **kw):
     '''like diff(), but yields 2-tuples of (output, label) for ui.write()'''
     return difflabel(diff, *args, **kw)
diff -r 3da4bd50103d -r 2cd53502aa3a tests/test-diff-color.t
--- a/tests/test-diff-color.t	Wed Nov 29 07:57:17 2017 +0530
+++ b/tests/test-diff-color.t	Thu Oct 26 00:13:38 2017 +0900
@@ -259,3 +259,95 @@  test tabs
   \x1b[0;32m+\x1b[0m\x1b[0;1;35m	\x1b[0m\x1b[0;32mall\x1b[0m\x1b[0;1;35m		\x1b[0m\x1b[0;32mtabs\x1b[0m\x1b[0;1;41m	\x1b[0m (esc)
 
   $ cd ..
+
+test inline color diff
+
+  $ hg init inline
+  $ cd inline
+  $ cat > file1 << EOF
+  > this is the first line
+  > this is the second line
+  >     third line starts with space
+  > + starts with a plus sign
+  > 
+  > this line won't change
+  > 
+  > two lines are going to
+  > be changed into three!
+  > 
+  > three of those lines will
+  > collapse onto one
+  > (to see if it works)
+  > EOF
+  $ hg add file1
+  $ hg ci -m 'commit'
+  $ cat > file1 << EOF
+  > that is the first paragraph
+  >     this is the second line
+  > third line starts with space
+  > - starts with a minus sign
+  > 
+  > this line won't change
+  > 
+  > two lines are going to
+  > (entirely magically,
+  >  assuming this works)
+  > be changed into four!
+  > 
+  > three of those lines have
+  > collapsed onto one
+  > EOF
+  $ hg diff --config experimental.inline-color-diff=False --color=debug
+  [diff.diffline|diff --git a/file1 b/file1]
+  [diff.file_a|--- a/file1]
+  [diff.file_b|+++ b/file1]
+  [diff.hunk|@@ -1,13 +1,14 @@]
+  [diff.deleted|-this is the first line]
+  [diff.deleted|-this is the second line]
+  [diff.deleted|-    third line starts with space]
+  [diff.deleted|-+ starts with a plus sign]
+  [diff.inserted|+that is the first paragraph]
+  [diff.inserted|+    this is the second line]
+  [diff.inserted|+third line starts with space]
+  [diff.inserted|+- starts with a minus sign]
+   
+   this line won't change
+   
+   two lines are going to
+  [diff.deleted|-be changed into three!]
+  [diff.inserted|+(entirely magically,]
+  [diff.inserted|+ assuming this works)]
+  [diff.inserted|+be changed into four!]
+   
+  [diff.deleted|-three of those lines will]
+  [diff.deleted|-collapse onto one]
+  [diff.deleted|-(to see if it works)]
+  [diff.inserted|+three of those lines have]
+  [diff.inserted|+collapsed onto one]
+  $ hg diff --config experimental.inline-color-diff=True --color=debug
+  [diff.diffline|diff --git a/file1 b/file1]
+  [diff.file_a|--- a/file1]
+  [diff.file_b|+++ b/file1]
+  [diff.hunk|@@ -1,13 +1,14 @@]
+  [diff.deleted|-][diff.deleted|this][diff.deleted| ][diff.deleted|is][diff.deleted| ][diff.deleted|the][diff.deleted| ][diff.deleted.highlight|first][diff.deleted| ][diff.deleted|line]
+  [diff.deleted|-this is the second line]
+  [diff.deleted|-][diff.deleted.highlight| ][diff.deleted.highlight| ][diff.deleted.highlight| ][diff.deleted.highlight| ][diff.deleted|third][diff.deleted| ][diff.deleted|line][diff.deleted| ][diff.deleted|starts][diff.deleted| ][diff.deleted|with][diff.deleted| ][diff.deleted|space]
+  [diff.deleted|-][diff.deleted.highlight|+][diff.deleted| ][diff.deleted|starts][diff.deleted| ][diff.deleted|with][diff.deleted| ][diff.deleted|a][diff.deleted| ][diff.deleted.highlight|plus][diff.deleted| ][diff.deleted|sign]
+  [diff.inserted|+that is the first paragraph]
+  [diff.inserted|+][diff.inserted.highlight| ][diff.inserted.highlight| ][diff.inserted.highlight| ][diff.inserted.highlight| ][diff.inserted|this][diff.inserted| ][diff.inserted|is][diff.inserted| ][diff.inserted|the][diff.inserted| ][diff.inserted.highlight|second][diff.inserted| ][diff.inserted|line]
+  [diff.inserted|+][diff.inserted|third][diff.inserted| ][diff.inserted|line][diff.inserted| ][diff.inserted|starts][diff.inserted| ][diff.inserted|with][diff.inserted| ][diff.inserted|space]
+  [diff.inserted|+][diff.inserted.highlight|-][diff.inserted| ][diff.inserted|starts][diff.inserted| ][diff.inserted|with][diff.inserted| ][diff.inserted|a][diff.inserted| ][diff.inserted.highlight|minus][diff.inserted| ][diff.inserted|sign]
+   
+   this line won't change
+   
+   two lines are going to
+  [diff.deleted|-][diff.deleted|be][diff.deleted| ][diff.deleted|changed][diff.deleted| ][diff.deleted|into][diff.deleted| ][diff.deleted.highlight|three][diff.deleted|!]
+  [diff.inserted|+(entirely magically,]
+  [diff.inserted|+ assuming this works)]
+  [diff.inserted|+][diff.inserted|be][diff.inserted| ][diff.inserted|changed][diff.inserted| ][diff.inserted|into][diff.inserted| ][diff.inserted.highlight|four][diff.inserted|!]
+   
+  [diff.deleted|-][diff.deleted|three][diff.deleted| ][diff.deleted|of][diff.deleted| ][diff.deleted|those][diff.deleted| ][diff.deleted|lines][diff.deleted| ][diff.deleted.highlight|will]
+  [diff.deleted|-][diff.deleted.highlight|collapse][diff.deleted| ][diff.deleted|onto][diff.deleted| ][diff.deleted|one]
+  [diff.deleted|-(to see if it works)]
+  [diff.inserted|+][diff.inserted|three][diff.inserted| ][diff.inserted|of][diff.inserted| ][diff.inserted|those][diff.inserted| ][diff.inserted|lines][diff.inserted| ][diff.inserted.highlight|have]
+  [diff.inserted|+][diff.inserted.highlight|collapsed][diff.inserted| ][diff.inserted|onto][diff.inserted| ][diff.inserted|one]