Patchwork [07,of,10] py3: handle unicode docstrings in registrar.py

login
register
mail settings
Submitter Pulkit Goyal
Date Aug. 2, 2016, 8:27 p.m.
Message ID <a77a7f6e8bfc90901d82.1470169649@pulkit-goyal>
Download mbox | patch
Permalink /patch/16046/
State Changes Requested
Headers show

Comments

Pulkit Goyal - Aug. 2, 2016, 8:27 p.m.
# HG changeset patch
# User Pulkit Goyal <7895pulkit@gmail.com>
# Date 1470168548 -19800
#      Wed Aug 03 01:39:08 2016 +0530
# Node ID a77a7f6e8bfc90901d829257a782ff11e2bae0f7
# Parent  da4a0ba184d3eff2819d73884770d342edce88c1
py3: handle unicode docstrings in registrar.py

The module importer on Python 3 doesn't rewrite docstrings to bytes
literals. So we need to teach document formatting to convert the
documentation to bytes before formatting to ensure consistent
types are used and the result in bytes.
Yuya Nishihara - Aug. 3, 2016, 2:16 p.m.
On Wed, 03 Aug 2016 01:57:29 +0530, Pulkit Goyal wrote:
> # HG changeset patch
> # User Pulkit Goyal <7895pulkit@gmail.com>
> # Date 1470168548 -19800
> #      Wed Aug 03 01:39:08 2016 +0530
> # Node ID a77a7f6e8bfc90901d829257a782ff11e2bae0f7
> # Parent  da4a0ba184d3eff2819d73884770d342edce88c1
> py3: handle unicode docstrings in registrar.py
> 
> The module importer on Python 3 doesn't rewrite docstrings to bytes
> literals. So we need to teach document formatting to convert the
> documentation to bytes before formatting to ensure consistent
> types are used and the result in bytes.
> 
> diff -r da4a0ba184d3 -r a77a7f6e8bfc mercurial/registrar.py
> --- a/mercurial/registrar.py	Wed Aug 03 01:33:29 2016 +0530
> +++ b/mercurial/registrar.py	Wed Aug 03 01:39:08 2016 +0530
> @@ -83,6 +83,10 @@
>  
>          'doc' is '__doc__.strip()' of the registered function.
>          """
> +        # docstrings are using the source file encoding, which should be
> +        # utf-8.
> +        if not isinstance(doc, bytes):
> +            doc = doc.encode(u'utf-8')
>          return self._docformat % (decl, doc)

I think the conversion should be made where __doc__ is accessed. Doing that
at random places would be a source of bugs.

Patch

diff -r da4a0ba184d3 -r a77a7f6e8bfc mercurial/registrar.py
--- a/mercurial/registrar.py	Wed Aug 03 01:33:29 2016 +0530
+++ b/mercurial/registrar.py	Wed Aug 03 01:39:08 2016 +0530
@@ -83,6 +83,10 @@ 
 
         'doc' is '__doc__.strip()' of the registered function.
         """
+        # docstrings are using the source file encoding, which should be
+        # utf-8.
+        if not isinstance(doc, bytes):
+            doc = doc.encode(u'utf-8')
         return self._docformat % (decl, doc)
 
     def _extrasetup(self, name, func):