Patchwork [3,of,3] highlight: don't highlight files that have fancy linebreaks (issue4291)

login
register
mail settings
Submitter Augie Fackler
Date Dec. 17, 2014, 4:16 p.m.
Message ID <41fb6e503cd6e08fd4da.1418832971@augie-macbookair2.roam.corp.google.com>
Download mbox | patch
Permalink /patch/7145/
State Accepted
Headers show

Comments

Augie Fackler - Dec. 17, 2014, 4:16 p.m.
# HG changeset patch
# User Augie Fackler <raf@durin42.com>
# Date 1418764197 18000
#      Tue Dec 16 16:09:57 2014 -0500
# Node ID 41fb6e503cd6e08fd4dac4a1c3f0a55818708f3f
# Parent  00cf82b3d1453640f454d7f8d628e501228cf391
highlight: don't highlight files that have fancy linebreaks (issue4291)

Spotted this on a file with a form feed in the cpython repository. I'm
not sure how to make the lines line up correctly, so for now we'll
disable highlighting if the file contains such an exotic linebreak.
Pierre-Yves David - Dec. 17, 2014, 8:34 p.m.
On 12/17/2014 08:16 AM, Augie Fackler wrote:
> # HG changeset patch
> # User Augie Fackler <raf@durin42.com>
> # Date 1418764197 18000
> #      Tue Dec 16 16:09:57 2014 -0500
> # Node ID 41fb6e503cd6e08fd4dac4a1c3f0a55818708f3f
> # Parent  00cf82b3d1453640f454d7f8d628e501228cf391
> highlight: don't highlight files that have fancy linebreaks (issue4291)

Theses three are pushed to the clowncopter. Thanks.
Matt Mackall - Dec. 18, 2014, 12:33 a.m.
On Wed, 2014-12-17 at 12:34 -0800, Pierre-Yves David wrote:
> 
> On 12/17/2014 08:16 AM, Augie Fackler wrote:
> > # HG changeset patch
> > # User Augie Fackler <raf@durin42.com>
> > # Date 1418764197 18000
> > #      Tue Dec 16 16:09:57 2014 -0500
> > # Node ID 41fb6e503cd6e08fd4dac4a1c3f0a55818708f3f
> > # Parent  00cf82b3d1453640f454d7f8d628e501228cf391
> > highlight: don't highlight files that have fancy linebreaks (issue4291)
> 
> Theses three are pushed to the clowncopter. Thanks.

As I posted on the issue, I don't like this approach. Unfortunately I
didn't spot the patches for another 7 hours. We've replaced a mysterious
missing line with an even more mysterious failure to highlight at all.
We should instead filter out the problematic non-printing characters; a 
browser's not going to usefully display them or reliably cut and paste
them anyway.
Sean Farley - Dec. 18, 2014, 4:17 a.m.
Matt Mackall writes:

> On Wed, 2014-12-17 at 12:34 -0800, Pierre-Yves David wrote:
>> 
>> On 12/17/2014 08:16 AM, Augie Fackler wrote:
>> > # HG changeset patch
>> > # User Augie Fackler <raf@durin42.com>
>> > # Date 1418764197 18000
>> > #      Tue Dec 16 16:09:57 2014 -0500
>> > # Node ID 41fb6e503cd6e08fd4dac4a1c3f0a55818708f3f
>> > # Parent  00cf82b3d1453640f454d7f8d628e501228cf391
>> > highlight: don't highlight files that have fancy linebreaks (issue4291)
>> 
>> Theses three are pushed to the clowncopter. Thanks.
>
> As I posted on the issue, I don't like this approach. Unfortunately I
> didn't spot the patches for another 7 hours. We've replaced a mysterious
> missing line with an even more mysterious failure to highlight at all.
> We should instead filter out the problematic non-printing characters; a 
> browser's not going to usefully display them or reliably cut and paste
> them anyway.

Just a project infrastructure question: since both clowncopter and you
use evolve, can't you reject (prune) patches after pulling from
clowncopter?
Pierre-Yves David - Dec. 18, 2014, 4:18 a.m.
On 12/17/2014 08:17 PM, Sean Farley wrote:
>
> Matt Mackall writes:
>
>> On Wed, 2014-12-17 at 12:34 -0800, Pierre-Yves David wrote:
>>>
>>> On 12/17/2014 08:16 AM, Augie Fackler wrote:
>>>> # HG changeset patch
>>>> # User Augie Fackler <raf@durin42.com>
>>>> # Date 1418764197 18000
>>>> #      Tue Dec 16 16:09:57 2014 -0500
>>>> # Node ID 41fb6e503cd6e08fd4dac4a1c3f0a55818708f3f
>>>> # Parent  00cf82b3d1453640f454d7f8d628e501228cf391
>>>> highlight: don't highlight files that have fancy linebreaks (issue4291)
>>>
>>> Theses three are pushed to the clowncopter. Thanks.
>>
>> As I posted on the issue, I don't like this approach. Unfortunately I
>> didn't spot the patches for another 7 hours. We've replaced a mysterious
>> missing line with an even more mysterious failure to highlight at all.
>> We should instead filter out the problematic non-printing characters; a
>> browser's not going to usefully display them or reliably cut and paste
>> them anyway.
>
> Just a project infrastructure question: since both clowncopter and you
> use evolve, can't you reject (prune) patches after pulling from
> clowncopter?

He can and he will.

>

Patch

diff --git a/hgext/highlight/highlight.py b/hgext/highlight/highlight.py
--- a/hgext/highlight/highlight.py
+++ b/hgext/highlight/highlight.py
@@ -34,7 +34,13 @@  def pygmentize(field, fctx, style, tmpl)
 
     # Pygments is best used with Unicode strings:
     # <http://pygments.org/docs/unicode/>
+    numlines = len(text.splitlines())
     text = text.decode(encoding.encoding, 'replace')
+    if numlines != len(text.splitlines()):
+        # Some characters count as line separators to unicode but not
+        # bytes, notably U+000C FORM FEED. If one of those is present,
+        # the lines won't line up and highlight will omit lines.
+        return
 
     # To get multi-line strings right, we can't format line-by-line
     try:
diff --git a/tests/test-highlight.t b/tests/test-highlight.t
--- a/tests/test-highlight.t
+++ b/tests/test-highlight.t
@@ -186,8 +186,9 @@  hgweb filerevision, html
   </body>
   </html>
   
-file with exotic linebreak (currently broken: the
-last line of the file is missing)
+A file with exotic linebreak is not highlighted because highlighting
+causes us to skip lines at the end of the file. It would be better if
+this could be fixed, but the way to do that is unclear.
 
   $ ("$TESTDIR/get-with-headers.py" localhost:$HGPORT 'file/tip/with-ff.py') \
   >     | sed "s/class=\"k\"/class=\"kn\"/g" | sed "s/class=\"mf\"/class=\"mi\"/g"
@@ -273,8 +274,8 @@  last line of the file is missing)
   <div class="sourcefirst linewraptoggle">line wrap: <a class="linewraplink" href="javascript:toggleLinewrap()">on</a></div>
   <div class="sourcefirst"> line source</div>
   <pre class="sourcelines stripes4 wrap">
-  <span id="l1"><span class="n">x</span> <span class="o">=</span> <span class="mi">1</span></span><a href="#l1"></a>
-  <span id="l2"><span class="n">x</span><span class="o">=</span><span class="mi">2</span></span><a href="#l2"></a></pre>
+  <span id="l1">x = 1</span><a href="#l1"></a>
+  <span id="l2">x=2\x0cx=&quot;three&quot;</span><a href="#l2"></a></pre> (esc)
   <div class="sourcelast"></div>
   </div>
   </div>