Patchwork [3,of,3,v2] filemerge: exclude some markup languages from conflict marker detection

login
register
mail settings
Submitter Gregory Szorc
Date Aug. 31, 2014, 7:56 a.m.
Message ID <ab2e712ecf4e79a59c79.1409471774@gps-mbp.local>
Download mbox | patch
Permalink /patch/5649/
State Accepted
Headers show

Comments

Gregory Szorc - Aug. 31, 2014, 7:56 a.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1409470817 -7200
#      Sun Aug 31 09:40:17 2014 +0200
# Node ID ab2e712ecf4e79a59c7985359b00d21d0d39cfd6
# Parent  35c1d268bbd092ffb15b85605b4f5d36c3d08986
filemerge: exclude some markup languages from conflict marker detection

Detection of conflict markers looks for the '=======' string, which may
occur in markup languages such as markdown and restructured text.
We omit this part of conflict marker detection for these files.
Augie Fackler - Aug. 31, 2014, 8:46 a.m.
On Aug 31, 2014, at 9:56 AM, Gregory Szorc <gregory.szorc@gmail.com> wrote:

> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1409470817 -7200
> #      Sun Aug 31 09:40:17 2014 +0200
> # Node ID ab2e712ecf4e79a59c7985359b00d21d0d39cfd6
> # Parent  35c1d268bbd092ffb15b85605b4f5d36c3d08986
> filemerge: exclude some markup languages from conflict marker detection
> 
> Detection of conflict markers looks for the '=======' string, which may
> occur in markup languages such as markdown and restructured text.
> We omit this part of conflict marker detection for these files.
> 
> diff --git a/mercurial/filemerge.py b/mercurial/filemerge.py
> --- a/mercurial/filemerge.py
> +++ b/mercurial/filemerge.py
> @@ -464,11 +464,25 @@ def filemerge(repo, mynode, orig, fcd, f
>     util.unlink(b)
>     util.unlink(c)
>     return r
> 
> +_includeequals = (
> +    '.markdown',
> +    '.md',
> +    '.rst',
> +)

This worries me a little, since it's a hardcoded list (rather than configurable).

Maybe we could look at the base texts for left and right and only warn if the base text doesn't look like it contains conflict markers?

> +
> def hasconflictmarkers(fctx):
>     """Whether data from a filecontext has conflict markers."""
> -    return re.search("^(<<<<<<< .*|=======|>>>>>>> .*)$", fctx.data(),
> -                     re.MULTILINE)
> +    path = fctx.path()
> +    regexp = '^(<<<<<<< .*|>>>>>>> .*'
> +    # Some files commonly contain "=======". We try to avoid false positives.
> +    for suffix in _includeequals:
> +        if path.endswith(suffix):
> +            regexp += '|======='
> +            break
> +    regexp += ')$'
> +
> +    return re.search(regexp, fctx.data(), re.MULTILINE)
> 
> # tell hggettext to extract docstrings from these functions:
> i18nfunctions = internals.values()
> diff --git a/tests/test-resolve.t b/tests/test-resolve.t
> --- a/tests/test-resolve.t
> +++ b/tests/test-resolve.t
> @@ -81,4 +81,45 @@ conflict markers in file marked as resol
>> EOF
>   $ hg resolve -m file
>   (file appears to contain conflict markers; did you intend to mark as resolved?)
>   (no more unresolved files)
> +
> +  $ cd ..
> +
> +conflict marker warning is ignored for files with those patterns
> +
> +  $ hg init markdown
> +  $ cd markdown
> +  $ cat >> readme.md << EOF
> +  > == Index
> +  > Hello
> +  > ========= Another section
> +  > more content
> +  > EOF
> +  $ cat >> readme.rst << EOF
> +  > ========
> +  > My Index
> +  > ========
> +  > EOF
> +  $ hg commit -A -m 'add readme'
> +  adding readme.md
> +  adding readme.rst
> +  $ echo 'even more content' >> readme.md
> +  $ echo 'even more content' >> readme.rst
> +  $ hg commit -m 'append readme'
> +
> +  $ hg up -r 0
> +  2 files updated, 0 files merged, 0 files removed, 0 files unresolved
> +  $ echo 'alternative content' >> readme.md
> +  $ echo 'alternative content' >> readme.rst
> +  $ hg commit -m 'other append'
> +  created new head
> +
> +  $ hg merge --tool=internal:fail
> +  0 files updated, 0 files merged, 0 files removed, 2 files unresolved
> +  use 'hg resolve' to retry unresolved file merges or 'hg update -C .' to abandon
> +  [1]
> +
> +  $ hg resolve -m
> +  (no more unresolved files)
> +
> +  $ cd ..
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
Pierre-Yves David - Sept. 2, 2014, 7:11 p.m.
On 08/31/2014 10:46 AM, Augie Fackler wrote:
>
> On Aug 31, 2014, at 9:56 AM, Gregory Szorc <gregory.szorc@gmail.com> wrote:
>
>> # HG changeset patch
>> # User Gregory Szorc <gregory.szorc@gmail.com>
>> # Date 1409470817 -7200
>> #      Sun Aug 31 09:40:17 2014 +0200
>> # Node ID ab2e712ecf4e79a59c7985359b00d21d0d39cfd6
>> # Parent  35c1d268bbd092ffb15b85605b4f5d36c3d08986
>> filemerge: exclude some markup languages from conflict marker detection
>>
>> Detection of conflict markers looks for the '=======' string, which may
>> occur in markup languages such as markdown and restructured text.
>> We omit this part of conflict marker detection for these files.
>>
>> diff --git a/mercurial/filemerge.py b/mercurial/filemerge.py
>> --- a/mercurial/filemerge.py
>> +++ b/mercurial/filemerge.py
>> @@ -464,11 +464,25 @@ def filemerge(repo, mynode, orig, fcd, f
>>      util.unlink(b)
>>      util.unlink(c)
>>      return r
>>
>> +_includeequals = (
>> +    '.markdown',
>> +    '.md',
>> +    '.rst',
>> +)
>
> This worries me a little, since it's a hardcoded list (rather than configurable).
>
> Maybe we could look at the base texts for left and right and only warn if the base text doesn't look like it contains conflict markers?

The whitelisting of some file extension seems a bad idea to me. I think 
it is too narrow and too fragile to worth diverging from main behavior.

Markers detections are also misled by file containing:
- documentation about conflict markers,
- tests about conflict markers,
- configuration about conflict markers,
- code to handle conflict markers.

Mercurial is probably a pathological project when it comes to this files.

I still love the idea in general and I feel like the few false positive 
can suffer the warning.

Patch

diff --git a/mercurial/filemerge.py b/mercurial/filemerge.py
--- a/mercurial/filemerge.py
+++ b/mercurial/filemerge.py
@@ -464,11 +464,25 @@  def filemerge(repo, mynode, orig, fcd, f
     util.unlink(b)
     util.unlink(c)
     return r
 
+_includeequals = (
+    '.markdown',
+    '.md',
+    '.rst',
+)
+
 def hasconflictmarkers(fctx):
     """Whether data from a filecontext has conflict markers."""
-    return re.search("^(<<<<<<< .*|=======|>>>>>>> .*)$", fctx.data(),
-                     re.MULTILINE)
+    path = fctx.path()
+    regexp = '^(<<<<<<< .*|>>>>>>> .*'
+    # Some files commonly contain "=======". We try to avoid false positives.
+    for suffix in _includeequals:
+        if path.endswith(suffix):
+            regexp += '|======='
+            break
+    regexp += ')$'
+
+    return re.search(regexp, fctx.data(), re.MULTILINE)
 
 # tell hggettext to extract docstrings from these functions:
 i18nfunctions = internals.values()
diff --git a/tests/test-resolve.t b/tests/test-resolve.t
--- a/tests/test-resolve.t
+++ b/tests/test-resolve.t
@@ -81,4 +81,45 @@  conflict markers in file marked as resol
   > EOF
   $ hg resolve -m file
   (file appears to contain conflict markers; did you intend to mark as resolved?)
   (no more unresolved files)
+
+  $ cd ..
+
+conflict marker warning is ignored for files with those patterns
+
+  $ hg init markdown
+  $ cd markdown
+  $ cat >> readme.md << EOF
+  > == Index
+  > Hello
+  > ========= Another section
+  > more content
+  > EOF
+  $ cat >> readme.rst << EOF
+  > ========
+  > My Index
+  > ========
+  > EOF
+  $ hg commit -A -m 'add readme'
+  adding readme.md
+  adding readme.rst
+  $ echo 'even more content' >> readme.md
+  $ echo 'even more content' >> readme.rst
+  $ hg commit -m 'append readme'
+
+  $ hg up -r 0
+  2 files updated, 0 files merged, 0 files removed, 0 files unresolved
+  $ echo 'alternative content' >> readme.md
+  $ echo 'alternative content' >> readme.rst
+  $ hg commit -m 'other append'
+  created new head
+
+  $ hg merge --tool=internal:fail
+  0 files updated, 0 files merged, 0 files removed, 2 files unresolved
+  use 'hg resolve' to retry unresolved file merges or 'hg update -C .' to abandon
+  [1]
+
+  $ hg resolve -m
+  (no more unresolved files)
+
+  $ cd ..