Patchwork D3212: patch: implement a new worddiff algorithm

login
register
mail settings
Submitter phabricator
Date April 9, 2018, 10:59 p.m.
Message ID <differential-rev-PHID-DREV-ie3m53wgwy6nwd5uk3h4-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/30621/
State Superseded
Headers show

Comments

phabricator - April 9, 2018, 10:59 p.m.
quark created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  The previous worddiff algorithm has many problems. The major problem is it
  does a "similarity check" that selects a subset of matched lines to do
  inline diffs. It is a bad idea because:
  
  - The "similarity check" is non-obvious to users. For example, a simple change from "long long x" to "int64_t x" will fail the similarity check and won't be diff-ed as expected.
  - Selecting "lines" to diff won't work as people expect if there are line wrapping changes.
  - It has a sad time complexity if lines do not match, could be O(N^2)-ish.
  
  There are other problems in implementation details.
  
  - Lines can match across distant hunks (if the next hunk does not have "-" lines).
  - "difflib" is slow.
  
  The solution would be removing the "similarity check", and just diff all
  words in a same hunk. So no content will be missed and everything will be
  diff-ed as expected. This is similar to what code review tool like
  Phabricator does.
  
  This diff implements the word diff algorithm as described above. It also
  avoids difflib to be faster.
  
  Note about colors: To be consistent, "changed inserted" parts and "purely
  insertion blocks" should have a same color, since they do not exist in the
  previous version. Instead of highlighting differences, this patch chooses to
  dim common parts. This is also more consistent with Phabricator or GitHub
  webpage. That said, the labels are defined in a way that people can still
  highlight changed parts and leave purely inserted/deleted hunks use the
  "non-highlighted" color.
  
  As one example, running:
  
    hg log -pr df50b87d8f736aff8dc281f816bddcd6f306930c mercurial/commands.py \
      --config experimental.worddiff=1 --color=debug --config diff.unified=0
  
  The previous algorithm outputs:
  
    [diff.file_a|--- a/mercurial/commands.py	Fri Mar 09 15:53:41 2018 +0100]
    [diff.file_b|+++ b/mercurial/commands.py	Sat Mar 10 12:33:19 2018 +0530]
    [diff.hunk|@@ -2039,1 +2039,4 @@]
    [diff.deleted|-][diff.deleted.highlight|@command('^forget',][diff.deleted| ][diff.deleted.highlight|walkopts,][diff.deleted| _('[OPTION]... FILE...'), inferrepo=True)]
    [diff.inserted|+@command(]
    [diff.inserted|+    '^forget',]
    [diff.inserted|+    walkopts + dryrunopts,]
    [diff.inserted|+ ][diff.inserted.highlight|  ][diff.inserted| _('[OPTION]... FILE...'), inferrepo=True)]
    [diff.hunk|@@ -2074,1 +2077,3 @@]
    [diff.deleted|-    rejected = cmdutil.forget(ui, repo, m, prefix="",][diff.deleted.highlight| explicitonly=False)[0]]
    [diff.inserted|+    dryrun = opts.get(r'dry_run')]
    [diff.inserted|+    rejected = cmdutil.forget(ui, repo, m, prefix="",]
    [diff.inserted|+                              explicitonly=False, dryrun=dryrun)[0]]
  
  The new algorithm outputs:
  
    [diff.file_a|--- a/mercurial/commands.py	Fri Mar 09 15:53:41 2018 +0100]
    [diff.file_b|+++ b/mercurial/commands.py	Sat Mar 10 12:33:19 2018 +0530]
    [diff.hunk|@@ -2039,1 +2039,4 @@]
    [diff.deleted|-][diff.deleted.unchanged|@command(][diff.deleted.unchanged|'^forget',][diff.deleted.unchanged| ][diff.deleted.changed|walkopts][diff.deleted.unchanged|,][diff.deleted.changed| ][diff.deleted.unchanged|_('[OPTION]... FILE...'), inferrepo=True)]
    [diff.inserted|+][diff.inserted.unchanged|@command(]
    [diff.inserted|+][diff.inserted.changed|    ][diff.inserted.unchanged|'^forget',]
    [diff.inserted|+][diff.inserted.changed|    walkopts][diff.inserted.unchanged| ][diff.inserted.changed|+ dryrunopts][diff.inserted.unchanged|,]
    [diff.inserted|+][diff.inserted.changed|    ][diff.inserted.unchanged|_('[OPTION]... FILE...'), inferrepo=True)]
    [diff.hunk|@@ -2074,1 +2077,3 @@]
    [diff.deleted|-][diff.deleted.unchanged|    rejected = cmdutil.forget(ui, repo, m, prefix="",][diff.deleted.changed| ][diff.deleted.unchanged|explicitonly=False][diff.deleted.unchanged|)[0]]
    [diff.inserted|+][diff.inserted.changed|    dryrun = opts.get(r'dry_run')]
    [diff.inserted|+][diff.inserted.unchanged|    rejected = cmdutil.forget(ui, repo, m, prefix="",]
    [diff.inserted|+][diff.inserted.changed|                              ][diff.inserted.unchanged|explicitonly=False][diff.inserted.changed|, dryrun=dryrun][diff.inserted.unchanged|)[0]]
  
  Practically, when diffing a 8k line change, the time spent on worddiff
  reduces from 4 seconds to 0.14 seconds.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3212

AFFECTED FILES
  mercurial/color.py
  mercurial/patch.py
  tests/test-diff-color.t

CHANGE DETAILS




To: quark, #hg-reviewers
Cc: mercurial-devel, spectral
phabricator - April 9, 2018, 11:11 p.m.
quark added a comment.


  This is the before and after comparison:
  
  F69700: worddiff-compare.png <https://phab.mercurial-scm.org/F69700>

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3212

To: quark, #hg-reviewers, durin42
Cc: mercurial-devel
phabricator - April 10, 2018, 1:03 a.m.
spectral added inline comments.

INLINE COMMENTS

> color.py:94
> +    'diff.deleted.changed': 'red',
> +    'diff.deleted.unchanged': 'red dim',
>      'diff.diffline': 'bold',

These are the first uses of 'dim' in the default set of things, and I don't think we can rely on it; for color.mode=auto, we really mean "ansi" (aka ecma48) (unless on windows), and don't do any detection of capabilities, so we just output \e[2m and some terminals just ignore that (like mine, rxvt-unicode v9.22).  If using color.mode=terminfo, we at least get error messages (I did --config color.log.user='dim green'):

ignoring unknown color/effect 'dim' (configured in color.log.user)

Apparently cygwin doesn't advertise 'dim', and while the linux virtual console advertises it and supports it, it doesn't actually support a dim color (at least on my machine), it just always puts it in a weird blue :)

I think I'd prefer that `changed` be bold and `unchanged` be non-bold.  For most terminals, that'll lead to a visible difference in intensity (bold being brighter unless using a weird palette), and for those that aren't configured for that, it'll at least be a heavier weight.  It's better than having literally zero difference between them without any explanation why.  I think it'll also be more obvious which lines have it; in your screenshot the difference between dim and regular is pretty subtle.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3212

To: quark, #hg-reviewers, durin42
Cc: spectral, mercurial-devel
phabricator - April 10, 2018, 1:57 a.m.
quark added inline comments.

INLINE COMMENTS

> spectral wrote in color.py:94
> These are the first uses of 'dim' in the default set of things, and I don't think we can rely on it; for color.mode=auto, we really mean "ansi" (aka ecma48) (unless on windows), and don't do any detection of capabilities, so we just output \e[2m and some terminals just ignore that (like mine, rxvt-unicode v9.22).  If using color.mode=terminfo, we at least get error messages (I did --config color.log.user='dim green'):
> 
> ignoring unknown color/effect 'dim' (configured in color.log.user)
> 
> Apparently cygwin doesn't advertise 'dim', and while the linux virtual console advertises it and supports it, it doesn't actually support a dim color (at least on my machine), it just always puts it in a weird blue :)
> 
> I think I'd prefer that `changed` be bold and `unchanged` be non-bold.  For most terminals, that'll lead to a visible difference in intensity (bold being brighter unless using a weird palette), and for those that aren't configured for that, it'll at least be a heavier weight.  It's better than having literally zero difference between them without any explanation why.  I think it'll also be more obvious which lines have it; in your screenshot the difference between dim and regular is pretty subtle.

As I mentioned in the summary, I believe `diff.inserted` and `diff.inserted.changed` should have a same color. And `diff.inserted` probably shouldn't be bold.

Looking at this review page, you will notice the `diff.inserted` and `diff.inserted.changed` are using a same color, where `diff.inserted.unchanged` is using a different (lighter) background color.

`dim` works fine where it is supported. For terminals that do not support it, people can override the settings. For `dim` feature detection, that's an issue in the color code which is unrelated to this change. Since worddiff is experimental and off by default, I don't think dim detection should block this patch.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3212

To: quark, #hg-reviewers, durin42
Cc: spectral, mercurial-devel
phabricator - April 10, 2018, 3 p.m.
yuja requested changes to this revision.
yuja added a comment.
This revision now requires changes to proceed.


  I have no opinion about the "dim" thingy, but the series generally looks
  good to me.
  
  Thanks for tackling on the painfully slow `SequenceMatcher.ratio()` issue.

INLINE COMMENTS

> patch.py:53
>  tabsplitter = re.compile(br'(\t+|[^\t]+)')
> -_nonwordre = re.compile(br'([^a-zA-Z0-9_\x80-\xff])')
> +wordsplitter = re.compile(br'(\t+| +|[a-zA-Z0-9_\x80-\xff]+|'
> +                          '[^ \ta-zA-Z0-9_\x80-\xff])')

Nit: `_wordsplitter` as it is private constant

> patch.py:54
> +wordsplitter = re.compile(br'(\t+| +|[a-zA-Z0-9_\x80-\xff]+|'
> +                          '[^ \ta-zA-Z0-9_\x80-\xff])')
>  

Missed `br''` here though "\t" and "\x" of string escape are compatible with regexp's.

> patch.py:2536
> +        for token in mdiff.splitnewlines(''.join(bl[b1:b2])):
> +            btokens.append((changed, token))
> +

Nit: maybe we can sort out tokens here instead of re-parsing tabs, newlines, trailing whitespaces later.

But I'm not sure if that will make things simpler.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3212

To: quark, #hg-reviewers, durin42, yuja
Cc: yuja, spectral, mercurial-devel
phabricator - April 10, 2018, 3:40 p.m.
yuja added a comment.


  BTW, what's the default of git?
  
  The comment in color.py is a bit scary, which says "most terminals don't
  support dim." If that's true, we shouldn't use "dim" by default.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3212

To: quark, #hg-reviewers, durin42, yuja
Cc: yuja, spectral, mercurial-devel
phabricator - April 10, 2018, 6:46 p.m.
quark added a comment.


  Git first had a `contrib/diff-highlight/diff-highlight` script which inverts foreground/background for hunks with len(deleted_lines) = len(inserted_lines).
  
  Then the latest version shows diff inline. That is:
  
    common words [+inserted words with green color][-deleted words with red color] common words
  
  I dislike that, since it could be ambiguous ("[+ x]" could be part of the original text).
  
  For colors, it's really a hard question. I think if we can detect "dim" is unsupported and make it a no-op, then it'd be fine to use. `infocmp` can report "dim" correctly on my Linux terminal. Internally, we patched color.py to use 16 and 256 colors even if terminfo reports 8 colors. But I guess that's not an acceptable solution here.

INLINE COMMENTS

> yuja wrote in patch.py:2536
> Nit: maybe we can sort out tokens here instead of re-parsing tabs, newlines, trailing whitespaces later.
> 
> But I'm not sure if that will make things simpler.

For a split list `['\n', '\t', ' ']`, mdiff might return a hunk that joins them. So `''.join(al[a1:a2])` will become more complex.

I think having mdiff step free from EOL/tab handling makes the code easier to read.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3212

To: quark, #hg-reviewers, durin42, yuja
Cc: yuja, spectral, mercurial-devel
phabricator - April 10, 2018, 7:39 p.m.
spectral added subscribers: dhduvall, indygreg.
spectral added inline comments.

INLINE COMMENTS

> quark wrote in color.py:94
> As I mentioned in the summary, I believe `diff.inserted` and `diff.inserted.changed` should have a same color. And `diff.inserted` probably shouldn't be bold.
> 
> Looking at this review page, you will notice the `diff.inserted` and `diff.inserted.changed` are using a same color, where `diff.inserted.unchanged` is using a different (lighter) background color.
> 
> `dim` works fine where it is supported. For terminals that do not support it, people can override the settings. For `dim` feature detection, that's an issue in the color code which is unrelated to this change. Since worddiff is experimental and off by default, I don't think dim detection should block this patch.

I understand the reasoning for wanting `diff.inserted` and `diff.inserted.changed` to be the same color, but unfortunately I think it might not be super feasible in upstream.

'dim' support in my terminfo database (and via testing) is actually pretty common among the biggest terminals; for oddities, I'm seeing:

- the linux console (very few people use it) has weird support
- screen simulates it with underline, tmux passes it through (so I don't see it)
- rxvt-unicode doesn't support it
- the terminfo profile for cygwin doesn't indicate support for it
- Apple's terminal advertises itself as xterm-compatible and seems to support it
- iTerm2 similarly

I'm willing to be convinced it's OK, especially since this mode isn't the default.  @dhduvall originally wrote the "not supported by many terminals" bit, I wonder if they have any suggestions.  @indygreg has also dealt with terminal issues (and deals with windows more than me, so might know more there).

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3212

To: quark, #hg-reviewers, durin42, yuja
Cc: indygreg, dhduvall, yuja, spectral, mercurial-devel
phabricator - April 10, 2018, 9:11 p.m.
quark added inline comments.

INLINE COMMENTS

> spectral wrote in color.py:94
> I understand the reasoning for wanting `diff.inserted` and `diff.inserted.changed` to be the same color, but unfortunately I think it might not be super feasible in upstream.
> 
> 'dim' support in my terminfo database (and via testing) is actually pretty common among the biggest terminals; for oddities, I'm seeing:
> 
> - the linux console (very few people use it) has weird support
> - screen simulates it with underline, tmux passes it through (so I don't see it)
> - rxvt-unicode doesn't support it
> - the terminfo profile for cygwin doesn't indicate support for it
> - Apple's terminal advertises itself as xterm-compatible and seems to support it
> - iTerm2 similarly
> 
> I'm willing to be convinced it's OK, especially since this mode isn't the default.  @dhduvall originally wrote the "not supported by many terminals" bit, I wonder if they have any suggestions.  @indygreg has also dealt with terminal issues (and deals with windows more than me, so might know more there).

There are not many choices - dim, 16/256 colors, or bold. We ended up with 16/256 colors internally for wider support (ex. tmux). But I'd like to express my (strong) options that:

- diff.inserted.changed and diff.inserted are same
- diff.inserted is not bold

in this patch. Because color.py has no 16/256 support yet (and if it does detection conservatively, most terminals will only report 8 color support). And the only remaining choice is "dim". For weird terminals like tmux, I think it's their bugs to fix, not this patch.

That said, I'm fine with changing the defaults to whatever. So feel free to send follow-ups changing it.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3212

To: quark, #hg-reviewers, durin42, yuja
Cc: indygreg, dhduvall, yuja, spectral, mercurial-devel
phabricator - April 11, 2018, 2:42 p.m.
yuja added a comment.


  > That said, I'm fine with changing the defaults to whatever. So feel
  >  free to send follow-ups changing it.
  
  Can you split a patch changing the color scheme so we can easily
  back it out as needed?

INLINE COMMENTS

> quark wrote in patch.py:2536
> For a split list `['\n', '\t', ' ']`, mdiff might return a hunk that joins them. So `''.join(al[a1:a2])` will become more complex.
> 
> I think having mdiff step free from EOL/tab handling makes the code easier to read.

Ah, got it. Let's leave mdiff refactoring until later, then.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3212

To: quark, #hg-reviewers, durin42, yuja
Cc: indygreg, dhduvall, yuja, spectral, mercurial-devel
phabricator - April 11, 2018, 9:07 p.m.
quark added a comment.


  In https://phab.mercurial-scm.org/D3212#51917, @yuja wrote:
  
  > Can you split a patch changing the color scheme so we can easily
  >  back it out as needed?
  
  
  Note the color configs are not entirely equivalent to the old code. To give an example:
  
    -LINE-1
    -LINE-2
    -LINE-3
    -LINE-4-FOO
    +LINE-4-BAR
  
  The old code will use "normal" color for LINE-1 to LINE-3. And highlight only "FOO", "BAR". The new code will treat LINE-1 to 3 and "FOO", "BAR" the same, because they are the "changed" part, and treat "LINE-4-" differently. If we use "bold" for "changed" then it's much easier to make large hunks of code bold, which will look noisy.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3212

To: quark, #hg-reviewers, durin42, yuja
Cc: indygreg, dhduvall, yuja, spectral, mercurial-devel
phabricator - April 12, 2018, 11:51 a.m.
yuja added a comment.


  > Note the color configs are not entirely equivalent to the old code.
  
  I know. It would be roughly equivalent to the old code of `sim >= 0.0`.
  
  > Therefore I still think it's reasonable to avoid using "bold" in this patch to avoid noisy outputs.
  
  I agree on that, but the default output has to be somewhat distinguishable
  even if "dim" isn't supported.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3212

To: quark, #hg-reviewers, durin42, yuja
Cc: indygreg, dhduvall, yuja, spectral, mercurial-devel
phabricator - April 12, 2018, 2:10 p.m.
dhduvall added inline comments.

INLINE COMMENTS

> quark wrote in color.py:94
> There are not many choices - dim, 16/256 colors, or bold. We ended up with 16/256 colors internally for wider support (ex. tmux). But I'd like to express my (strong) options that:
> 
> - diff.inserted.changed and diff.inserted are same
> - diff.inserted is not bold
> 
> in this patch. Because color.py has no 16/256 support yet (and if it does detection conservatively, most terminals will only report 8 color support). And the only remaining choice is "dim". For weird terminals like tmux, I think it's their bugs to fix, not this patch.
> 
> That said, I'm fine with changing the defaults to whatever. So feel free to send follow-ups changing it.

I'm not sure what testing I did to claim "most", but I did make that comment in 2011, and xterm didn't start supporting dim until 2014 (it had long supported invisible), so that certainly helped inform that comment. Given @spectral's survey, I'd probably rewrite the comment simply to say that we're using `ui.debug()` to not be noisy in the wide world of different terminals.

Given support by xterm, iTerm2, and Terminal.app (and I just verified gnome-terminal supports it, too), I'd lean towards just using dim whenever it made sense now, though it'd be nice to know that at least one major Windows terminal app supported it, too. I don't know how widespread rxvt-unicode is in the Linux world, or other terminal emulators that might not support it.

I won't comment on the dim-the-unchanged vs bright-the-changed bits, or how this interacts with the ANSI color mode, other than to say that the default colors aren't an interface, and can be tweaked later without worries about backwards compatibility. Obviously you don't want to change them radically without good reason once people get used to them, but tweaking dim/bold/normal/background/whatever once people have had a chance to experiment more widely might be a reasonable thing to do.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3212

To: quark, #hg-reviewers, durin42, yuja
Cc: indygreg, dhduvall, yuja, spectral, mercurial-devel
phabricator - April 13, 2018, 12:42 a.m.
spectral added inline comments.

INLINE COMMENTS

> dhduvall wrote in color.py:94
> I'm not sure what testing I did to claim "most", but I did make that comment in 2011, and xterm didn't start supporting dim until 2014 (it had long supported invisible), so that certainly helped inform that comment. Given @spectral's survey, I'd probably rewrite the comment simply to say that we're using `ui.debug()` to not be noisy in the wide world of different terminals.
> 
> Given support by xterm, iTerm2, and Terminal.app (and I just verified gnome-terminal supports it, too), I'd lean towards just using dim whenever it made sense now, though it'd be nice to know that at least one major Windows terminal app supported it, too. I don't know how widespread rxvt-unicode is in the Linux world, or other terminal emulators that might not support it.
> 
> I won't comment on the dim-the-unchanged vs bright-the-changed bits, or how this interacts with the ANSI color mode, other than to say that the default colors aren't an interface, and can be tweaked later without worries about backwards compatibility. Obviously you don't want to change them radically without good reason once people get used to them, but tweaking dim/bold/normal/background/whatever once people have had a chance to experiment more widely might be a reasonable thing to do.

Ah, didn't know that xterm support for it was that recent :)

I'm fine with assuming that rxvt-unicode isn't used enough to warrant avoiding it here, especially if someone with a windows machine can chime in with whether or not it's supported.  For platforms where the default terminal emulator doesn't support it, the maintainers can adjust it to be whatever they like (or we could do so if we create official packages for that platform/distribution).

tmux *does* support it, it just doesn't support *emulating* it if the terminal on the outside of tmux doesn't.

I'm fine with the change using dim.  Maybe I can use this to get rxvt-unicode to improve their support. :P

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3212

To: quark, #hg-reviewers, durin42, yuja
Cc: indygreg, dhduvall, yuja, spectral, mercurial-devel
phabricator - April 13, 2018, 2:37 a.m.
quark added a comment.


  I think one alternative is just to use `green_background` like `git/contrib/diff-highlight/diff-highlight.perl`. It satisfies all properties I'd like to have, and is supported by weird terminals including cmd.exe and less.exe. And is different from `diff.file_a`, `diff.file_b` colors. I'll probably just use this.
  
  F71015: 2018-04-12-193346_956x996_scrot.png <https://phab.mercurial-scm.org/F71015>

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3212

To: quark, #hg-reviewers, durin42, yuja
Cc: indygreg, dhduvall, yuja, spectral, mercurial-devel
phabricator - April 13, 2018, 4:56 a.m.
dhduvall added a comment.


  I can't really tell what's going on there, frankly, unlike with the dim case, where it make sense, but I can always change the colors myself.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3212

To: quark, #hg-reviewers, durin42, yuja
Cc: indygreg, dhduvall, yuja, spectral, mercurial-devel
phabricator - April 16, 2018, 11:01 p.m.
durin42 added a comment.


  In the name of progress, I'm going to land these. If we find problems with color defaults, let's try and fix them in the RC period, but it's still an experimental feature so we've got plenty of wiggle room.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3212

To: quark, #hg-reviewers, durin42, yuja
Cc: indygreg, dhduvall, yuja, spectral, mercurial-devel
phabricator - April 16, 2018, 11:04 p.m.
durin42 added a comment.


  (I'll say that red-dimmed text is very hard for this moderate protan with a black terminal window.)

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3212

To: quark, #hg-reviewers, durin42, yuja
Cc: indygreg, dhduvall, yuja, spectral, mercurial-devel

Patch

diff --git a/tests/test-diff-color.t b/tests/test-diff-color.t
--- a/tests/test-diff-color.t
+++ b/tests/test-diff-color.t
@@ -337,41 +337,39 @@ 
   [diff.deleted|-(to see if it works)]
   [diff.inserted|+three of those lines have]
   [diff.inserted|+collapsed onto one]
-#if false
   $ hg diff --config experimental.worddiff=True --color=debug
   [diff.diffline|diff --git a/file1 b/file1]
   [diff.file_a|--- a/file1]
   [diff.file_b|+++ b/file1]
   [diff.hunk|@@ -1,16 +1,17 @@]
-  [diff.deleted|-this is the ][diff.deleted.highlight|first][diff.deleted| line]
-  [diff.deleted|-this is the second line]
-  [diff.deleted|-][diff.deleted.highlight|    ][diff.deleted|third line starts with space]
-  [diff.deleted|-][diff.deleted.highlight|+][diff.deleted| starts with a ][diff.deleted.highlight|plus][diff.deleted| sign]
-  [diff.deleted|-][diff.tab|	][diff.deleted|this one with ][diff.deleted.highlight|one][diff.deleted| tab]
-  [diff.deleted|-][diff.tab|		][diff.deleted|now with full ][diff.deleted.highlight|two][diff.deleted| tabs]
-  [diff.deleted|-][diff.tab|	][diff.deleted|now tabs][diff.tab|		][diff.deleted|everywhere, much fun]
-  [diff.inserted|+that is the first paragraph]
-  [diff.inserted|+][diff.inserted.highlight|    ][diff.inserted|this is the ][diff.inserted.highlight|second][diff.inserted| line]
-  [diff.inserted|+third line starts with space]
-  [diff.inserted|+][diff.inserted.highlight|-][diff.inserted| starts with a ][diff.inserted.highlight|minus][diff.inserted| sign]
-  [diff.inserted|+][diff.tab|	][diff.inserted|this one with ][diff.inserted.highlight|two][diff.inserted| tab]
-  [diff.inserted|+][diff.tab|			][diff.inserted|now with full ][diff.inserted.highlight|three][diff.inserted| tabs]
-  [diff.inserted|+][diff.tab|	][diff.inserted|now][diff.inserted.highlight| there are][diff.inserted| tabs][diff.tab|		][diff.inserted|everywhere, much fun]
+  [diff.deleted|-][diff.deleted.changed|this][diff.deleted.unchanged| is the first ][diff.deleted.changed|line]
+  [diff.deleted|-][diff.deleted.unchanged|this is the second line]
+  [diff.deleted|-][diff.deleted.changed|    ][diff.deleted.unchanged|third line starts with space]
+  [diff.deleted|-][diff.deleted.changed|+][diff.deleted.unchanged| starts with a ][diff.deleted.changed|plus][diff.deleted.unchanged| sign]
+  [diff.deleted|-][diff.tab|	][diff.deleted.unchanged|this one with ][diff.deleted.changed|one][diff.deleted.unchanged| tab]
+  [diff.deleted|-][diff.tab|		][diff.deleted.unchanged|now with full ][diff.deleted.changed|two][diff.deleted.unchanged| tabs]
+  [diff.deleted|-][diff.tab|	][diff.deleted.unchanged|now ][diff.deleted.unchanged|tabs][diff.tab|		][diff.deleted.unchanged|everywhere, much fun]
+  [diff.inserted|+][diff.inserted.changed|that][diff.inserted.unchanged| is the first ][diff.inserted.changed|paragraph]
+  [diff.inserted|+][diff.inserted.changed|    ][diff.inserted.unchanged|this is the second line]
+  [diff.inserted|+][diff.inserted.unchanged|third line starts with space]
+  [diff.inserted|+][diff.inserted.changed|-][diff.inserted.unchanged| starts with a ][diff.inserted.changed|minus][diff.inserted.unchanged| sign]
+  [diff.inserted|+][diff.tab|	][diff.inserted.unchanged|this one with ][diff.inserted.changed|two][diff.inserted.unchanged| tab]
+  [diff.inserted|+][diff.tab|			][diff.inserted.unchanged|now with full ][diff.inserted.changed|three][diff.inserted.unchanged| tabs]
+  [diff.inserted|+][diff.tab|	][diff.inserted.unchanged|now ][diff.inserted.changed|there are ][diff.inserted.unchanged|tabs][diff.tab|		][diff.inserted.unchanged|everywhere, much fun]
    
    this line won't change
    
    two lines are going to
-  [diff.deleted|-be changed into ][diff.deleted.highlight|three][diff.deleted|!]
-  [diff.inserted|+(entirely magically,]
-  [diff.inserted|+ assuming this works)]
-  [diff.inserted|+be changed into ][diff.inserted.highlight|four][diff.inserted|!]
+  [diff.deleted|-][diff.deleted.unchanged|be changed into ][diff.deleted.changed|three][diff.deleted.unchanged|!]
+  [diff.inserted|+][diff.inserted.changed|(entirely magically,]
+  [diff.inserted|+][diff.inserted.changed| assuming this works)]
+  [diff.inserted|+][diff.inserted.unchanged|be changed into ][diff.inserted.changed|four][diff.inserted.unchanged|!]
    
-  [diff.deleted|-three of those lines ][diff.deleted.highlight|will]
-  [diff.deleted|-][diff.deleted.highlight|collapse][diff.deleted| onto one]
-  [diff.deleted|-(to see if it works)]
-  [diff.inserted|+three of those lines ][diff.inserted.highlight|have]
-  [diff.inserted|+][diff.inserted.highlight|collapsed][diff.inserted| onto one]
-#endif
+  [diff.deleted|-][diff.deleted.unchanged|three of those lines ][diff.deleted.changed|will]
+  [diff.deleted|-][diff.deleted.changed|collapse][diff.deleted.unchanged| onto one]
+  [diff.deleted|-][diff.deleted.changed|(to see if it works)]
+  [diff.inserted|+][diff.inserted.unchanged|three of those lines ][diff.inserted.changed|have]
+  [diff.inserted|+][diff.inserted.changed|collapsed][diff.inserted.unchanged| onto one]
 
 multibyte character shouldn't be broken up in word diff:
 
@@ -386,12 +384,10 @@ 
   > EOF
   $ hg ci -m 'slightly change utf8 char' utf8
 
-#if false
   $ hg diff --config experimental.worddiff=True --color=debug -c.
   [diff.diffline|diff --git a/utf8 b/utf8]
   [diff.file_a|--- a/utf8]
   [diff.file_b|+++ b/utf8]
   [diff.hunk|@@ -1,1 +1,1 @@]
-  [diff.deleted|-blah ][diff.deleted.highlight|\xe3\x82\xa2][diff.deleted| blah] (esc)
-  [diff.inserted|+blah ][diff.inserted.highlight|\xe3\x82\xa4][diff.inserted| blah] (esc)
-#endif
+  [diff.deleted|-][diff.deleted.unchanged|blah ][diff.deleted.changed|\xe3\x82\xa2][diff.deleted.unchanged| blah] (esc)
+  [diff.inserted|+][diff.inserted.unchanged|blah ][diff.inserted.changed|\xe3\x82\xa4][diff.inserted.unchanged| blah] (esc)
diff --git a/mercurial/patch.py b/mercurial/patch.py
--- a/mercurial/patch.py
+++ b/mercurial/patch.py
@@ -50,7 +50,8 @@ 
 
 gitre = re.compile(br'diff --git a/(.*) b/(.*)')
 tabsplitter = re.compile(br'(\t+|[^\t]+)')
-_nonwordre = re.compile(br'([^a-zA-Z0-9_\x80-\xff])')
+wordsplitter = re.compile(br'(\t+| +|[a-zA-Z0-9_\x80-\xff]+|'
+                          '[^ \ta-zA-Z0-9_\x80-\xff])')
 
 PatchError = error.PatchError
 
@@ -2498,8 +2499,78 @@ 
         if chompline != line:
             yield (line[len(chompline):], '')
 
+def diffsinglehunkinline(hunklines):
+    """yield tokens for a list of lines in a single hunk, with inline colors"""
+    # prepare deleted, and inserted content
+    a = ''
+    b = ''
+    for line in hunklines:
+        if line[0] == '-':
+            a += line[1:]
+        elif line[0] == '+':
+            b += line[1:]
+        else:
+            raise error.ProgrammingError('unexpected hunk line: %s' % line)
+    # fast path: if either side is empty, use diffsinglehunk
+    if not a or not b:
+        for t in diffsinglehunk(hunklines):
+            yield t
+        return
+    # re-split the content into words
+    al = wordsplitter.findall(a)
+    bl = wordsplitter.findall(b)
+    # re-arrange the words to lines since the diff algorithm is line-based
+    aln = [s if s == '\n' else s + '\n' for s in al]
+    bln = [s if s == '\n' else s + '\n' for s in bl]
+    an = ''.join(aln)
+    bn = ''.join(bln)
+    # run the diff algorithm, prepare atokens and btokens
+    atokens = []
+    btokens = []
+    blocks = mdiff.allblocks(an, bn, lines1=aln, lines2=bln)
+    for (a1, a2, b1, b2), btype in blocks:
+        changed = btype == '!'
+        for token in mdiff.splitnewlines(''.join(al[a1:a2])):
+            atokens.append((changed, token))
+        for token in mdiff.splitnewlines(''.join(bl[b1:b2])):
+            btokens.append((changed, token))
+
+    # yield deleted tokens, then inserted ones
+    for prefix, label, tokens in [('-', 'diff.deleted', atokens),
+                                  ('+', 'diff.inserted', btokens)]:
+        nextisnewline = True
+        for changed, token in tokens:
+            if nextisnewline:
+                yield (prefix, label)
+                nextisnewline = False
+            # special handling line end
+            isendofline = token.endswith('\n')
+            if isendofline:
+                chomp = token[:-1] # chomp
+                token = chomp.rstrip() # detect spaces at the end
+                endspaces = chomp[len(token):]
+            # scan tabs
+            for maybetab in tabsplitter.findall(token):
+                if '\t' == maybetab[0]:
+                    currentlabel = 'diff.tab'
+                else:
+                    if changed:
+                        currentlabel = label + '.changed'
+                    else:
+                        currentlabel = label + '.unchanged'
+                yield (maybetab, currentlabel)
+            if isendofline:
+                if endspaces:
+                    yield (endspaces, 'diff.trailingwhitespace')
+                yield ('\n', '')
+                nextisnewline = True
+
 def difflabel(func, *args, **kw):
     '''yields 2-tuples of (output, label) based on the output of func()'''
+    if kw.get(r'opts') and kw[r'opts'].worddiff:
+        dodiffhunk = diffsinglehunkinline
+    else:
+        dodiffhunk = diffsinglehunk
     headprefixes = [('diff', 'diff.diffline'),
                     ('copy', 'diff.extended'),
                     ('rename', 'diff.extended'),
@@ -2519,7 +2590,7 @@ 
     hunkbuffer = []
     def consumehunkbuffer():
         if hunkbuffer:
-            for token in diffsinglehunk(hunkbuffer):
+            for token in dodiffhunk(hunkbuffer):
                 yield token
             hunkbuffer[:] = []
 
diff --git a/mercurial/color.py b/mercurial/color.py
--- a/mercurial/color.py
+++ b/mercurial/color.py
@@ -90,14 +90,16 @@ 
     'branches.inactive': 'none',
     'diff.changed': 'white',
     'diff.deleted': 'red',
-    'diff.deleted.highlight': 'red bold underline',
+    'diff.deleted.changed': 'red',
+    'diff.deleted.unchanged': 'red dim',
     'diff.diffline': 'bold',
     'diff.extended': 'cyan bold',
     'diff.file_a': 'red bold',
     'diff.file_b': 'green bold',
     'diff.hunk': 'magenta',
     'diff.inserted': 'green',
-    'diff.inserted.highlight': 'green bold underline',
+    'diff.inserted.changed': 'green',
+    'diff.inserted.unchanged': 'green dim',
     'diff.tab': '',
     'diff.trailingwhitespace': 'bold red_background',
     'changeset.public': '',