Patchwork [STABLE] churn: compute padding with unicode strings

login
register
mail settings
Submitter Isaac Jurado
Date April 19, 2014, 11:03 p.m.
Message ID <66336e9c96377ca6e48f.1397948605@findus>
Download mbox | patch
Permalink /patch/4413/
State Superseded
Commit 9846b40d01e78fe9d95abe6adc73dd0af31bc375
Headers show

Comments

Isaac Jurado - April 19, 2014, 11:03 p.m.
# HG changeset patch
# User Isaac Jurado <diptongo@gmail.com>
# Date 1397913085 -7200
#      Sat Apr 19 15:11:25 2014 +0200
# Branch stable
# Node ID 66336e9c96377ca6e48f54d33390c1bcd24b209d
# Parent  d924e387604f4a1b90469de773517841eba40c80
churn: compute padding with unicode strings

Most UTF-8 aware terminals convert multibyte sequences into a single displayed
characters.  Because the first column is padded by counting bytes, the second
column is not perfectly aligned in the presence of non ASCII characters.
Yuya Nishihara - April 20, 2014, 1:35 p.m.
On Sun, 20 Apr 2014 01:03:25 +0200, Isaac Jurado wrote:
> # HG changeset patch
> # User Isaac Jurado <diptongo@gmail.com>
> # Date 1397913085 -7200
> #      Sat Apr 19 15:11:25 2014 +0200
> # Branch stable
> # Node ID 66336e9c96377ca6e48f54d33390c1bcd24b209d
> # Parent  d924e387604f4a1b90469de773517841eba40c80
> churn: compute padding with unicode strings
> 
> Most UTF-8 aware terminals convert multibyte sequences into a single displayed
> characters.  Because the first column is padded by counting bytes, the second
> column is not perfectly aligned in the presence of non ASCII characters.
> 
> diff --git a/hgext/churn.py b/hgext/churn.py
> --- a/hgext/churn.py
> +++ b/hgext/churn.py
> @@ -10,6 +10,7 @@
>  
>  from mercurial.i18n import _
>  from mercurial import patch, cmdutil, scmutil, util, templater, commands
> +from mercurial import encoding
>  import os
>  import time, datetime
>  
> @@ -124,7 +125,7 @@
>      Aliases will be split from the rightmost "=".
>      '''
>      def pad(s, l):
> -        return (s + " " * l)[:l]
> +        return (s.decode("UTF-8") + u" " * l)[:l].encode(encoding.encoding)

It seems "s" isn't a utf8 bytes because it is a template output.  And
encoding.colwidth() will give more accurate result.

http://mercurial.selenic.com/wiki/EncodingStrategy#Functions

Regards,
Isaac Jurado - April 20, 2014, 2:18 p.m.
El 20/04/2014 15:36, "Yuya Nishihara" <yuya@tcha.org> escribió:
>
> On Sun, 20 Apr 2014 01:03:25 +0200, Isaac Jurado wrote:
> > # HG changeset patch
> > # User Isaac Jurado <diptongo@gmail.com>
> > # Date 1397913085 -7200
> > #      Sat Apr 19 15:11:25 2014 +0200
> > # Branch stable
> > # Node ID 66336e9c96377ca6e48f54d33390c1bcd24b209d
> > # Parent  d924e387604f4a1b90469de773517841eba40c80
> > churn: compute padding with unicode strings
> >
> > Most UTF-8 aware terminals convert multibyte sequences into a single
displayed
> > characters.  Because the first column is padded by counting bytes, the
second
> > column is not perfectly aligned in the presence of non ASCII characters.
> >
> > diff --git a/hgext/churn.py b/hgext/churn.py
> > --- a/hgext/churn.py
> > +++ b/hgext/churn.py
> > @@ -10,6 +10,7 @@
> >
> >  from mercurial.i18n import _
> >  from mercurial import patch, cmdutil, scmutil, util, templater,
commands
> > +from mercurial import encoding
> >  import os
> >  import time, datetime
> >
> > @@ -124,7 +125,7 @@
> >      Aliases will be split from the rightmost "=".
> >      '''
> >      def pad(s, l):
> > -        return (s + " " * l)[:l]
> > +        return (s.decode("UTF-8") + u" " *
l)[:l].encode(encoding.encoding)
>
> It seems "s" isn't a utf8 bytes because it is a template output.  And
> encoding.colwidth() will give more accurate result.
>
> http://mercurial.selenic.com/wiki/EncodingStrategy#Functions

You're right.  I didn't understand completely the functions in the encoding
module.  I'll check de wiki entry and take a deeper look.  Will resend
later today.

Thanks a lot for the review.

Patch

diff --git a/hgext/churn.py b/hgext/churn.py
--- a/hgext/churn.py
+++ b/hgext/churn.py
@@ -10,6 +10,7 @@ 
 
 from mercurial.i18n import _
 from mercurial import patch, cmdutil, scmutil, util, templater, commands
+from mercurial import encoding
 import os
 import time, datetime
 
@@ -124,7 +125,7 @@ 
     Aliases will be split from the rightmost "=".
     '''
     def pad(s, l):
-        return (s + " " * l)[:l]
+        return (s.decode("UTF-8") + u" " * l)[:l].encode(encoding.encoding)
 
     amap = {}
     aliases = opts.get('aliases')
diff --git a/tests/test-churn.t b/tests/test-churn.t
--- a/tests/test-churn.t
+++ b/tests/test-churn.t
@@ -159,4 +159,16 @@ 
   user4@x.com      2 *****************************
   with space       1 **************
 
+Test multibyte sequences in names
+
+  $ echo bar >> bar
+  $ hg --encoding utf-8 ci -m'changed bar' -u 'El Niño <nino@x.com>'
+  $ hg --encoding utf-8 churn -ct '{author|person}'
+  user1           4 **********************************************************
+  user3           3 ********************************************
+  user2           2 *****************************
+  user4           2 *****************************
+  El Ni\xc3\xb1o         1 *************** (esc)
+  with space      1 ***************
+
   $ cd ..