Patchwork [RFC] highlight: add highlightfiles config option, which takes a fileset (issue3005)

login
register
mail settings
Submitter Anton Shestakov
Date Sept. 16, 2015, 2:36 p.m.
Message ID <ee1b85c5616ece49f530.1442414217@neuro>
Download mbox | patch
Permalink /patch/10513/
State Accepted
Headers show

Comments

Anton Shestakov - Sept. 16, 2015, 2:36 p.m.
# HG changeset patch
# User Anton Shestakov <av6@dwimlabs.net>
# Date 1442413836 -28800
#      Wed Sep 16 22:30:36 2015 +0800
# Node ID ee1b85c5616ece49f53095adfe85e90c3efeb48b
# Parent  7df5d476087392e217699a41c11fbe8cd48713b2
highlight: add highlightfiles config option, which takes a fileset (issue3005)

Highlight extension lacked a way to limit files by size, by extension, and/or
by any other part of file path. A good solution would be to use a fileset,
since it can check file path, extension and size (and more) in one expression.
So this change introduces such an option, highlighfiles, which takes a fileset
and on each request decides if the requested file should be highlighted.

The default "size('<5M')" is, in a way, suggested in issue3005.

checkfctx() limits the amount of work to just one file (subset kwarg in
fileset.matchctx()).

Monkey-patching works around issue4568, otherwise using filesets here while
running hgweb in directory mode would say, for example, "Abort: **.py not under
root", but this fix is very local and probably far from ideal. I suspect there
to be a way to fix this for the whole hgweb and resolve the issue, but I don't
know how to do it.
Matt Mackall - Sept. 16, 2015, 10:51 p.m.
On Wed, 2015-09-16 at 22:36 +0800, Anton Shestakov wrote:
> # HG changeset patch
> # User Anton Shestakov <av6@dwimlabs.net>
> # Date 1442413836 -28800
> #      Wed Sep 16 22:30:36 2015 +0800
> # Node ID ee1b85c5616ece49f53095adfe85e90c3efeb48b
> # Parent  7df5d476087392e217699a41c11fbe8cd48713b2
> highlight: add highlightfiles config option, which takes a fileset (issue3005)

Looks good to me, queued for default.
Yuya Nishihara - Sept. 17, 2015, 1:28 p.m.
On Wed, 16 Sep 2015 22:36:57 +0800, Anton Shestakov wrote:
> # HG changeset patch
> # User Anton Shestakov <av6@dwimlabs.net>
> # Date 1442413836 -28800
> #      Wed Sep 16 22:30:36 2015 +0800
> # Node ID ee1b85c5616ece49f53095adfe85e90c3efeb48b
> # Parent  7df5d476087392e217699a41c11fbe8cd48713b2
> highlight: add highlightfiles config option, which takes a fileset (issue3005)
> 
> Highlight extension lacked a way to limit files by size, by extension, and/or
> by any other part of file path. A good solution would be to use a fileset,
> since it can check file path, extension and size (and more) in one expression.
> So this change introduces such an option, highlighfiles, which takes a fileset
> and on each request decides if the requested file should be highlighted.
> 
> The default "size('<5M')" is, in a way, suggested in issue3005.
> 
> checkfctx() limits the amount of work to just one file (subset kwarg in
> fileset.matchctx()).
> 
> Monkey-patching works around issue4568, otherwise using filesets here while
> running hgweb in directory mode would say, for example, "Abort: **.py not under
> root", but this fix is very local and probably far from ideal. I suspect there
> to be a way to fix this for the whole hgweb and resolve the issue, but I don't
> know how to do it.

> +def checkfctx(fctx, expr):
> +    ctx = fctx.changectx()
> +    tree = fileset.parse(expr)
> +    mctx = fileset.matchctx(ctx, subset=[fctx.path()], status=None)
> +    repo = ctx.repo()
> +    # To allow matching file names in the fileset in hgweb directory mode.
> +    # See issue4568.
> +    object.__setattr__(repo, 'getcwd', lambda: repo.root)

It monkey-patches a repo object which lives long in the current thread. So
once you've visited a highlighted page, repo.getcwd() is faked after that.
I'm not sure if it can cause a real problem, but it smells bad.
Matt Mackall - Sept. 18, 2015, 12:47 a.m.
On Thu, 2015-09-17 at 22:28 +0900, Yuya Nishihara wrote:
> On Wed, 16 Sep 2015 22:36:57 +0800, Anton Shestakov wrote:
> > # HG changeset patch
> > # User Anton Shestakov <av6@dwimlabs.net>
> > # Date 1442413836 -28800
> > #      Wed Sep 16 22:30:36 2015 +0800
> > # Node ID ee1b85c5616ece49f53095adfe85e90c3efeb48b
> > # Parent  7df5d476087392e217699a41c11fbe8cd48713b2
> > highlight: add highlightfiles config option, which takes a fileset (issue3005)
> > 
> > Highlight extension lacked a way to limit files by size, by extension, and/or
> > by any other part of file path. A good solution would be to use a fileset,
> > since it can check file path, extension and size (and more) in one expression.
> > So this change introduces such an option, highlighfiles, which takes a fileset
> > and on each request decides if the requested file should be highlighted.
> > 
> > The default "size('<5M')" is, in a way, suggested in issue3005.
> > 
> > checkfctx() limits the amount of work to just one file (subset kwarg in
> > fileset.matchctx()).
> > 
> > Monkey-patching works around issue4568, otherwise using filesets here while
> > running hgweb in directory mode would say, for example, "Abort: **.py not under
> > root", but this fix is very local and probably far from ideal. I suspect there
> > to be a way to fix this for the whole hgweb and resolve the issue, but I don't
> > know how to do it.
> 
> > +def checkfctx(fctx, expr):
> > +    ctx = fctx.changectx()
> > +    tree = fileset.parse(expr)
> > +    mctx = fileset.matchctx(ctx, subset=[fctx.path()], status=None)
> > +    repo = ctx.repo()
> > +    # To allow matching file names in the fileset in hgweb directory mode.
> > +    # See issue4568.
> > +    object.__setattr__(repo, 'getcwd', lambda: repo.root)
> 
> It monkey-patches a repo object which lives long in the current thread. So
> once you've visited a highlighted page, repo.getcwd() is faked after that.
> I'm not sure if it can cause a real problem, but it smells bad.

It does indeed. On the other hand, hgweb should probably be doing
something like this itself because we never want the matcher to care
about what directory the webserver is running in.
Yuya Nishihara - Sept. 18, 2015, 12:49 p.m.
On Thu, 17 Sep 2015 17:47:03 -0700, Matt Mackall wrote:
> On Thu, 2015-09-17 at 22:28 +0900, Yuya Nishihara wrote:
> > On Wed, 16 Sep 2015 22:36:57 +0800, Anton Shestakov wrote:
> > > # HG changeset patch
> > > # User Anton Shestakov <av6@dwimlabs.net>
> > > # Date 1442413836 -28800
> > > #      Wed Sep 16 22:30:36 2015 +0800
> > > # Node ID ee1b85c5616ece49f53095adfe85e90c3efeb48b
> > > # Parent  7df5d476087392e217699a41c11fbe8cd48713b2
> > > highlight: add highlightfiles config option, which takes a fileset (issue3005)
> > > 
> > > Highlight extension lacked a way to limit files by size, by extension, and/or
> > > by any other part of file path. A good solution would be to use a fileset,
> > > since it can check file path, extension and size (and more) in one expression.
> > > So this change introduces such an option, highlighfiles, which takes a fileset
> > > and on each request decides if the requested file should be highlighted.
> > > 
> > > The default "size('<5M')" is, in a way, suggested in issue3005.
> > > 
> > > checkfctx() limits the amount of work to just one file (subset kwarg in
> > > fileset.matchctx()).
> > > 
> > > Monkey-patching works around issue4568, otherwise using filesets here while
> > > running hgweb in directory mode would say, for example, "Abort: **.py not under
> > > root", but this fix is very local and probably far from ideal. I suspect there
> > > to be a way to fix this for the whole hgweb and resolve the issue, but I don't
> > > know how to do it.
> > 
> > > +def checkfctx(fctx, expr):
> > > +    ctx = fctx.changectx()
> > > +    tree = fileset.parse(expr)
> > > +    mctx = fileset.matchctx(ctx, subset=[fctx.path()], status=None)
> > > +    repo = ctx.repo()
> > > +    # To allow matching file names in the fileset in hgweb directory mode.
> > > +    # See issue4568.
> > > +    object.__setattr__(repo, 'getcwd', lambda: repo.root)
> > 
> > It monkey-patches a repo object which lives long in the current thread. So
> > once you've visited a highlighted page, repo.getcwd() is faked after that.
> > I'm not sure if it can cause a real problem, but it smells bad.
> 
> It does indeed. On the other hand, hgweb should probably be doing
> something like this itself because we never want the matcher to care
> about what directory the webserver is running in.

Can we assume that repo.getcwd() and dirstate.getcwd() aren't used for
getting real cwd? It seems these functions are used only for matcher or
display paths, but it is hard to guess from the function name.

Patch

diff --git a/hgext/highlight/__init__.py b/hgext/highlight/__init__.py
--- a/hgext/highlight/__init__.py
+++ b/hgext/highlight/__init__.py
@@ -13,23 +13,32 @@ 
 It depends on the Pygments syntax highlighting library:
 http://pygments.org/
 
-There is a single configuration option::
+There are two configuration options::
 
   [web]
-  pygments_style = <style>
-
-The default is 'colorful'.
+  pygments_style = <style> (default: colorful)
+  highlightfiles = <fileset> (default: size('<5M'))
 """
 
 import highlight
 from mercurial.hgweb import webcommands, webutil, common
-from mercurial import extensions, encoding
+from mercurial import extensions, encoding, fileset
 # Note for extension authors: ONLY specify testedwith = 'internal' for
 # extensions which SHIP WITH MERCURIAL. Non-mainline extensions should
 # be specifying the version(s) of Mercurial they are tested with, or
 # leave the attribute unspecified.
 testedwith = 'internal'
 
+def checkfctx(fctx, expr):
+    ctx = fctx.changectx()
+    tree = fileset.parse(expr)
+    mctx = fileset.matchctx(ctx, subset=[fctx.path()], status=None)
+    repo = ctx.repo()
+    # To allow matching file names in the fileset in hgweb directory mode.
+    # See issue4568.
+    object.__setattr__(repo, 'getcwd', lambda: repo.root)
+    return fctx.path() in fileset.getset(mctx, tree)
+
 def filerevision_highlight(orig, web, req, tmpl, fctx):
     mt = ''.join(tmpl('mimetype', encoding=encoding.encoding))
     # only pygmentize for mimetype containing 'html' so we both match
@@ -41,7 +50,9 @@  def filerevision_highlight(orig, web, re
     # pygmentize a html file
     if 'html' in mt:
         style = web.config('web', 'pygments_style', 'colorful')
-        highlight.pygmentize('fileline', fctx, style, tmpl)
+        expr = web.config('web', 'highlightfiles', "size('<5M')")
+        if checkfctx(fctx, expr):
+            highlight.pygmentize('fileline', fctx, style, tmpl)
     return orig(web, req, tmpl, fctx)
 
 def annotate_highlight(orig, web, req, tmpl):
@@ -49,7 +60,9 @@  def annotate_highlight(orig, web, req, t
     if 'html' in mt:
         fctx = webutil.filectx(web.repo, req)
         style = web.config('web', 'pygments_style', 'colorful')
-        highlight.pygmentize('annotateline', fctx, style, tmpl)
+        expr = web.config('web', 'highlightfiles', "size('<5M')")
+        if checkfctx(fctx, expr):
+            highlight.pygmentize('annotateline', fctx, style, tmpl)
     return orig(web, req, tmpl)
 
 def generate_css(web, req, tmpl):
diff --git a/tests/test-highlight.t b/tests/test-highlight.t
--- a/tests/test-highlight.t
+++ b/tests/test-highlight.t
@@ -5,6 +5,7 @@ 
   > highlight =
   > [web]
   > pygments_style = friendly
+  > highlightfiles = **.py and size('<100KB')
   > EOF
   $ hg init test
   $ cd test
@@ -590,6 +591,28 @@  hgweb highlightcss fruity
 errors encountered
 
   $ cat errors.log
+  $ killdaemons.py
+
+only highlight C source files
+
+  $ cat > .hg/hgrc <<EOF
+  > [web]
+  > highlightfiles = **.c
+  > EOF
+
+hg serve again
+
+  $ hg serve -p $HGPORT -d -n test --pid-file=hg.pid -A access.log -E errors.log
+  $ cat hg.pid >> $DAEMON_PIDS
+
+test that fileset in highlightfiles works and primes.py is not highlighted
+
+  $ get-with-headers.py localhost:$HGPORT 'file/tip/primes.py' | grep 'id="l11"'
+  <span id="l11">def primes():</span><a href="#l11"></a>
+
+errors encountered
+
+  $ cat errors.log
   $ cd ..
   $ hg init eucjp
   $ cd eucjp