Patchwork [PoC] extensions: PoC readme extension for hgweb

login
register
mail settings
Submitter Anton Shestakov
Date May 25, 2018, 12:12 p.m.
Message ID <8133b5cebe9bcc24d91a.1527250323@usbssd.lan>
Download mbox | patch
Permalink /patch/31850/
State New
Headers show

Comments

Anton Shestakov - May 25, 2018, 12:12 p.m.
# HG changeset patch
# User Anton Shestakov <av6@dwimlabs.net>
# Date 1527241096 -7200
#      Fri May 25 11:38:16 2018 +0200
# Node ID 8133b5cebe9bcc24d91a0e5dfe2b191a11e4bb44
# Parent  6f67bfe4b82f1d0f1e11a7092262f961c49fdac9
extensions: PoC readme extension for hgweb

Here's a prototype extension that implements a feature that everybody expects
from a code hosting solution nowadays: rendering README files.

This extension uses pandoc to convert files from various formats to HTML to
display on summary page. It wraps only summary web command (so no README while
browsing files, for example), then it tries to match files using a configurable
fileset, and provided that's successful, it runs the first found file through
pandoc. Caching is in place to not require too much calls to subprocess.
Because pandoc can highlight code using pygments-compatible CSS classes, this
extension also injects a link to pygments styles when highlight extension is
loaded.

A couple of the more obvious problems:
- Only works on summary page, so no way to go browse around the repo and see
  the appropriate README from the current subdirectory.
- Pandoc is primarily built for enabling interactivity (just look at all those
  HTML5 presentation formats full of JS), but doesn't have any config knob to
  disable JS entirely. This might be a security problem.

This extension takes heavy inspiration from highlight extension (in wrapping a
web command and not requiring code in core Mercurial), but maybe it's not such
a good idea. I'm thinking about making something like {readme} keyword in core
that outputs nothing, but is backed by a nice accessible function that
extensions can override. This way it easily can work on any page in hgweb, if
needed.
Yuya Nishihara - May 25, 2018, 2:13 p.m.
On Fri, 25 May 2018 14:12:03 +0200, Anton Shestakov wrote:
> # HG changeset patch
> # User Anton Shestakov <av6@dwimlabs.net>
> # Date 1527241096 -7200
> #      Fri May 25 11:38:16 2018 +0200
> # Node ID 8133b5cebe9bcc24d91a0e5dfe2b191a11e4bb44
> # Parent  6f67bfe4b82f1d0f1e11a7092262f961c49fdac9
> extensions: PoC readme extension for hgweb
> 
> Here's a prototype extension that implements a feature that everybody expects
> from a code hosting solution nowadays: rendering README files.
> 
> This extension uses pandoc to convert files from various formats to HTML to
> display on summary page. It wraps only summary web command (so no README while
> browsing files, for example), then it tries to match files using a configurable
> fileset, and provided that's successful, it runs the first found file through
> pandoc. Caching is in place to not require too much calls to subprocess.
> Because pandoc can highlight code using pygments-compatible CSS classes, this
> extension also injects a link to pygments styles when highlight extension is
> loaded.
> 
> A couple of the more obvious problems:
> - Only works on summary page, so no way to go browse around the repo and see
>   the appropriate README from the current subdirectory.
> - Pandoc is primarily built for enabling interactivity (just look at all those
>   HTML5 presentation formats full of JS), but doesn't have any config knob to
>   disable JS entirely. This might be a security problem.
> 
> This extension takes heavy inspiration from highlight extension (in wrapping a
> web command and not requiring code in core Mercurial), but maybe it's not such
> a good idea. I'm thinking about making something like {readme} keyword in core
> that outputs nothing, but is backed by a nice accessible function that
> extensions can override. This way it easily can work on any page in hgweb, if
> needed.

Yeah, how highlight works is horrible. It hooks a filter function to
replace a given content with next(coloriter). The colorize filter is not
a filter at all.

Maybe we can add {readme} keyword in core, and add render filter/function
as the extension point? The core implementation of "render" will do &-escape
and wrap the content with <pre></pre>.

Patch

diff --git a/hgext3rd/readme.py b/hgext3rd/readme.py
new file mode 100644
--- /dev/null
+++ b/hgext3rd/readme.py
@@ -0,0 +1,91 @@ 
+from __future__ import absolute_import
+
+import os.path
+import tempfile
+
+from mercurial import (
+    error,
+    extensions,
+    fileset,
+    util,
+)
+
+from mercurial.hgweb import (
+    webcommands,
+)
+
+from mercurial.utils import (
+    procutil,
+    stringutil,
+)
+
+# Note for extension authors: ONLY specify testedwith = 'ships-with-hg-core' for
+# extensions which SHIP WITH MERCURIAL. Non-mainline extensions should
+# be specifying the version(s) of Mercurial they are tested with, or
+# leave the attribute unspecified.
+testedwith = '4.6'
+
+def render(repo, fctx, tmpl):
+    try:
+        highlight = extensions.find('highlight')
+    except KeyError:
+        highlight = None
+
+    if highlight is not None:
+        # append a <link ...> to the syntax highlighting css
+        old_header = tmpl.load('header')
+        if highlight.highlight.SYNTAX_CSS not in old_header:
+            new_header = old_header + highlight.highlight.SYNTAX_CSS
+            tmpl.cache['header'] = new_header
+
+    text = fctx.data()
+    if stringutil.binary(text):
+        return
+
+    fn = os.path.basename(fctx.path())
+    htmlfn = fctx.hex() + '-' + fn + '.html'
+    try:
+        tmpl.cache['readme'] = repo.cachevfs.read(htmlfn)
+    except (IOError, OSError):
+        tmpdir = tempfile.mkdtemp(prefix='readme.')
+        fn = os.path.join(tmpdir, fn)
+        with open(fn, 'wb') as fin:
+            fin.write(util.tonativeeol(text))
+
+        cmdline = 'pandoc --base-header-level 2 --to html ' + fn
+        fp = procutil.popen(cmdline, 'rb')
+        tmpl.cache['readme'] = fp.read()
+        ret = fp.close()
+        if ret:
+            raise error.Abort('pandoc %s' % procutil.explainexit(ret))
+        try:
+            with repo.cachevfs.open(htmlfn, 'wb', atomictemp=True) as f:
+                f.write(tmpl.cache['readme'])
+        except (IOError, OSError, error.Abort, error.LockError) as e:
+            repo.ui.debug("couldn't write readme cache: %s\n"
+                          % stringutil.forcebytestr(e))
+
+def findreadme(web, ctx):
+    expr = web.config('web', 'readme-files', 'README* and size("<128K")')
+
+    tree = fileset.parse(expr)
+    mctx = fileset.matchctx(ctx, ctx.manifest(), status=None)
+    readmes = fileset.getset(mctx, tree)
+    if readmes:
+        return ctx[sorted(readmes)[0]]
+
+def readmesummary(orig, web):
+    ct = web.res.headers['Content-Type']
+    if 'html' in ct:
+        fctx = findreadme(web, web.repo['tip'])
+        if fctx:
+            render(web.repo, fctx, web.tmpl)
+
+    return orig(web)
+
+def extsetup():
+    def afterload(loaded):
+        if loaded:
+            extensions.wrapfunction(webcommands, 'summary', readmesummary)
+
+    extensions.afterloaded('highlight', afterload)