Patchwork [hgweb-help-followup] help: teach topic symbols how to dedent

login
register
mail settings
Submitter Gregory Szorc
Date Feb. 9, 2015, 11:06 p.m.
Message ID <bd6a8cf1710d95b593ba.1423523163@vm-ubuntu-main.gateway.sonic.net>
Download mbox | patch
Permalink /patch/7771/
State Accepted
Commit 067540702f640afdc9f09362c4a0cca3ad144951
Headers show

Comments

Gregory Szorc - Feb. 9, 2015, 11:06 p.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1423522744 28800
#      Mon Feb 09 14:59:04 2015 -0800
# Node ID bd6a8cf1710d95b593babf3cad3224adff6064f9
# Parent  1f4fd12991f2b32b1277bd3e047132d5f087b914
help: teach topic symbols how to dedent

When using docstrings for documenting symbols such as revsets,
templates, or hgweb commands, documentation likely has leading
whitespace corresponding to the indentation from the Python source
file.

Up until this point, the help system stripped all leading and
trailing whitespace and replaced it with 2 spaces of leading
whitespace. There were a few bad side-effects. First, sections
could not be used in docstrings because they would be indented
and the rst parser would fail to parse them as sections. Also,
any rst elements that required indentation would lose their
indentation, again causing them to be parsed and rendered
incorrectly.

In this patch, we teach the topic symbols system how to dedent
text properly. I argue this mode should be enabled by default.
However, I stopped short of changing that because it would cause
a lot of documentation reformatting to occur. I'm not sure if
people are relying on or wanting indentation. So, dedenting has
only been turned on for hgweb symbols. This decision should be
scrutinized.
Augie Fackler - Feb. 9, 2015, 11:45 p.m.
On Mon, Feb 09, 2015 at 03:06:03PM -0800, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1423522744 28800
> #      Mon Feb 09 14:59:04 2015 -0800
> # Node ID bd6a8cf1710d95b593babf3cad3224adff6064f9
> # Parent  1f4fd12991f2b32b1277bd3e047132d5f087b914
> help: teach topic symbols how to dedent
>
> When using docstrings for documenting symbols such as revsets,
> templates, or hgweb commands, documentation likely has leading
> whitespace corresponding to the indentation from the Python source
> file.
>
> Up until this point, the help system stripped all leading and
> trailing whitespace and replaced it with 2 spaces of leading
> whitespace. There were a few bad side-effects. First, sections
> could not be used in docstrings because they would be indented
> and the rst parser would fail to parse them as sections. Also,
> any rst elements that required indentation would lose their
> indentation, again causing them to be parsed and rendered
> incorrectly.
>
> In this patch, we teach the topic symbols system how to dedent
> text properly. I argue this mode should be enabled by default.
> However, I stopped short of changing that because it would cause
> a lot of documentation reformatting to occur. I'm not sure if
> people are relying on or wanting indentation. So, dedenting has
> only been turned on for hgweb symbols. This decision should be
> scrutinized.

This looks reasonable, as does the conservative approach. A good
followup series (IMO) would be to figure out what can't handle
dedentation and fix that up so we can just turn this on globally.

Queued, and hopefully will push the hgweb-help series soon.

>
> diff --git a/mercurial/help.py b/mercurial/help.py
> --- a/mercurial/help.py
> +++ b/mercurial/help.py
> @@ -5,9 +5,9 @@
>  # This software may be used and distributed according to the terms of the
>  # GNU General Public License version 2 or any later version.
>
>  from i18n import gettext, _
> -import itertools, os
> +import itertools, os, textwrap
>  import error
>  import extensions, revset, fileset, templatekw, templatefilters, filemerge
>  import encoding, util, minirst
>  import cmdutil
> @@ -171,9 +171,9 @@ helphooks = {}
>
>  def addtopichook(topic, rewriter):
>      helphooks.setdefault(topic, []).append(rewriter)
>
> -def makeitemsdoc(topic, doc, marker, items):
> +def makeitemsdoc(topic, doc, marker, items, dedent=False):
>      """Extract docstring from the items key to function mapping, build a
>      .single documentation block and use it to overwrite the marker in doc
>      """
>      entries = []
> @@ -181,30 +181,36 @@ def makeitemsdoc(topic, doc, marker, ite
>          text = (items[name].__doc__ or '').rstrip()
>          if not text:
>              continue
>          text = gettext(text)
> +        if dedent:
> +            text = textwrap.dedent(text)
>          lines = text.splitlines()
>          doclines = [(lines[0])]
>          for l in lines[1:]:
>              # Stop once we find some Python doctest
>              if l.strip().startswith('>>>'):
>                  break
> -            doclines.append('  ' + l.strip())
> +            if dedent:
> +                doclines.append(l.rstrip())
> +            else:
> +                doclines.append('  ' + l.strip())
>          entries.append('\n'.join(doclines))
>      entries = '\n\n'.join(entries)
>      return doc.replace(marker, entries)
>
> -def addtopicsymbols(topic, marker, symbols):
> +def addtopicsymbols(topic, marker, symbols, dedent=False):
>      def add(topic, doc):
> -        return makeitemsdoc(topic, doc, marker, symbols)
> +        return makeitemsdoc(topic, doc, marker, symbols, dedent=dedent)
>      addtopichook(topic, add)
>
>  addtopicsymbols('filesets', '.. predicatesmarker', fileset.symbols)
>  addtopicsymbols('merge-tools', '.. internaltoolsmarker', filemerge.internals)
>  addtopicsymbols('revsets', '.. predicatesmarker', revset.symbols)
>  addtopicsymbols('templates', '.. keywordsmarker', templatekw.dockeywords)
>  addtopicsymbols('templates', '.. filtersmarker', templatefilters.filters)
> -addtopicsymbols('hgweb', '.. webcommandsmarker', webcommands.commands)
> +addtopicsymbols('hgweb', '.. webcommandsmarker', webcommands.commands,
> +                dedent=True)
>
>  def help_(ui, name, unknowncmd=False, full=True, **opts):
>      '''
>      Generate the help for 'name' as unformatted restructured text. If
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
Gregory Szorc - Feb. 10, 2015, 8:12 a.m.
On Mon, Feb 9, 2015 at 3:45 PM, Augie Fackler <raf@durin42.com> wrote:

> On Mon, Feb 09, 2015 at 03:06:03PM -0800, Gregory Szorc wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc@gmail.com>
> > # Date 1423522744 28800
> > #      Mon Feb 09 14:59:04 2015 -0800
> > # Node ID bd6a8cf1710d95b593babf3cad3224adff6064f9
> > # Parent  1f4fd12991f2b32b1277bd3e047132d5f087b914
> > help: teach topic symbols how to dedent
> >
> > When using docstrings for documenting symbols such as revsets,
> > templates, or hgweb commands, documentation likely has leading
> > whitespace corresponding to the indentation from the Python source
> > file.
> >
> > Up until this point, the help system stripped all leading and
> > trailing whitespace and replaced it with 2 spaces of leading
> > whitespace. There were a few bad side-effects. First, sections
> > could not be used in docstrings because they would be indented
> > and the rst parser would fail to parse them as sections. Also,
> > any rst elements that required indentation would lose their
> > indentation, again causing them to be parsed and rendered
> > incorrectly.
> >
> > In this patch, we teach the topic symbols system how to dedent
> > text properly. I argue this mode should be enabled by default.
> > However, I stopped short of changing that because it would cause
> > a lot of documentation reformatting to occur. I'm not sure if
> > people are relying on or wanting indentation. So, dedenting has
> > only been turned on for hgweb symbols. This decision should be
> > scrutinized.
>
> This looks reasonable, as does the conservative approach. A good
> followup series (IMO) would be to figure out what can't handle
> dedentation and fix that up so we can just turn this on globally.
>

I've looked into this and it is pain and more pain.

First, using textwrap.dedent() won't work universally. e.g.

def foo():
    """This is a summary.

    This is a new line."""

foo.__doc__ is "This is a summary.\n\n    This is a new line." The first
line is not indented, so textwrap.dedent() no-ops, since its job is to look
for universal indenting, ignoring empty lines.

PEP-257 recommends a parsing algorithm that does a custom strip based on
minimal leading whitespace in lines[1:]. Looks sane to me. We'll want to
implement that.

So I did.

But that took me to the next hurdle.

After implementing the recommended docstring normalization method, docs
weren't rendering properly at all. Why? Well, a lot of our docstrings (like
revsets) have the form:

def func():
    """``func()``
    This is a description.
    """

Our existing docstring normalizer normalizes this to "``func()``\n  This is
a description" (note the leading 2 spaces on subsequent lines). That
leading whitespace is important. Without it, a rst parser will assume the
next line is a line
continuation and it will merge the contents into a single paragraph. This,
of course, completely ruins the help documentation formatting.

On one hand, Mercurial's docstrings for things like revsets are in
violation of the recommendations in PEP-257 because there isn't a blank
line between the summary and description lines. We should look into
correcting that. On the other hand, this has been the convention for ages.
We can't be naive and assume extensions haven't cargo culted this
convention (I know I have). So, we'll probably want a mechanism to insert
an empty line between lines[0] and lines[1] in docstrings before we can
even talk about dedenting docstrings.

If we dedent doc strings, that replaces the 2 space indentation with...
nothing. Unfortunately, that 2 space indent is hard-coded in a number of
help pages (see the revset operators docs). So, we're talking about scope
creep to reformat a lot of help pages as well to match the new conventions.

If you get this far, you find that losing indentation makes the docs not
very readable. So we'll probably also want to replace "``foo``" patterns
with rst sections. Again, we'll likely want to update docstrings in source
*and* teach the help formatter how to do it so existing extensions don't
blow up help formatting. This will restore indentation and readability to
the plain text docs. As a bonus, it probably makes the HTML docs a little
easier to read too.

I think I know what needs to be done. But this rabbit hole runs deeper than
I initially thought. But good docs is worth the trouble. Let me see if I
can find time to crank out a patch series.

Patch

diff --git a/mercurial/help.py b/mercurial/help.py
--- a/mercurial/help.py
+++ b/mercurial/help.py
@@ -5,9 +5,9 @@ 
 # This software may be used and distributed according to the terms of the
 # GNU General Public License version 2 or any later version.
 
 from i18n import gettext, _
-import itertools, os
+import itertools, os, textwrap
 import error
 import extensions, revset, fileset, templatekw, templatefilters, filemerge
 import encoding, util, minirst
 import cmdutil
@@ -171,9 +171,9 @@  helphooks = {}
 
 def addtopichook(topic, rewriter):
     helphooks.setdefault(topic, []).append(rewriter)
 
-def makeitemsdoc(topic, doc, marker, items):
+def makeitemsdoc(topic, doc, marker, items, dedent=False):
     """Extract docstring from the items key to function mapping, build a
     .single documentation block and use it to overwrite the marker in doc
     """
     entries = []
@@ -181,30 +181,36 @@  def makeitemsdoc(topic, doc, marker, ite
         text = (items[name].__doc__ or '').rstrip()
         if not text:
             continue
         text = gettext(text)
+        if dedent:
+            text = textwrap.dedent(text)
         lines = text.splitlines()
         doclines = [(lines[0])]
         for l in lines[1:]:
             # Stop once we find some Python doctest
             if l.strip().startswith('>>>'):
                 break
-            doclines.append('  ' + l.strip())
+            if dedent:
+                doclines.append(l.rstrip())
+            else:
+                doclines.append('  ' + l.strip())
         entries.append('\n'.join(doclines))
     entries = '\n\n'.join(entries)
     return doc.replace(marker, entries)
 
-def addtopicsymbols(topic, marker, symbols):
+def addtopicsymbols(topic, marker, symbols, dedent=False):
     def add(topic, doc):
-        return makeitemsdoc(topic, doc, marker, symbols)
+        return makeitemsdoc(topic, doc, marker, symbols, dedent=dedent)
     addtopichook(topic, add)
 
 addtopicsymbols('filesets', '.. predicatesmarker', fileset.symbols)
 addtopicsymbols('merge-tools', '.. internaltoolsmarker', filemerge.internals)
 addtopicsymbols('revsets', '.. predicatesmarker', revset.symbols)
 addtopicsymbols('templates', '.. keywordsmarker', templatekw.dockeywords)
 addtopicsymbols('templates', '.. filtersmarker', templatefilters.filters)
-addtopicsymbols('hgweb', '.. webcommandsmarker', webcommands.commands)
+addtopicsymbols('hgweb', '.. webcommandsmarker', webcommands.commands,
+                dedent=True)
 
 def help_(ui, name, unknowncmd=False, full=True, **opts):
     '''
     Generate the help for 'name' as unformatted restructured text. If