Patchwork [py3] ui: construct _keepalnum list in a python3-friendly way

login
register
mail settings
Submitter Augie Fackler
Date Feb. 20, 2017, 1:05 a.m.
Message ID <0793350D-06B5-46F0-8D06-D7CCB9B0F296@durin42.com>
Download mbox | patch
Permalink /patch/18677/
State Not Applicable
Headers show

Comments

Augie Fackler - Feb. 20, 2017, 1:05 a.m.
> On Feb 19, 2017, at 9:29 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> 
> On Sat, 18 Feb 2017 22:58:10 +0000, Martijn Pieters wrote:
>> On 16 Feb 2017, at 16:35, Augie Fackler <raf@durin42.com <mailto:raf@durin42.com>> wrote:
>>> +if pycompat.ispy3:
>>> +    _unicodes = [bytes([c]).decode('latin1') for c in range(256)]
>>> +    _notalnum = [s.encode('latin1') for s in _unicodes if not s.isalnum()]
>> 
>> ...
>>> +_keepalnum = ''.join(_notalnum)
>> 
>> This could be more cheaply calculated as
>> 
>>    _keepalnum = bytes(c for c in range(256) if not chr(c).isalnum())
>> 
>> This takes a third of the time.
> 
> Good catch, but I found both of them are incorrect since str.isalnum() is
> unicode aware on Python3. We'll need to use bytes.isalnum() or string.*
> constants.

Oh, gross. I missed that. I think this patch fixes it, though not with the perf wins Martijn suggested:



Feel free to amend that into what’s already queued, or I can do a followup or resend as feels appropriate.
Yuya Nishihara - Feb. 20, 2017, 1:35 p.m.
On Sun, 19 Feb 2017 20:05:50 -0500, Augie Fackler wrote:
> 
> > On Feb 19, 2017, at 9:29 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> > 
> > On Sat, 18 Feb 2017 22:58:10 +0000, Martijn Pieters wrote:
> >> On 16 Feb 2017, at 16:35, Augie Fackler <raf@durin42.com <mailto:raf@durin42.com>> wrote:
> >>> +if pycompat.ispy3:
> >>> +    _unicodes = [bytes([c]).decode('latin1') for c in range(256)]
> >>> +    _notalnum = [s.encode('latin1') for s in _unicodes if not s.isalnum()]
> >> 
> >> ...
> >>> +_keepalnum = ''.join(_notalnum)
> >> 
> >> This could be more cheaply calculated as
> >> 
> >>    _keepalnum = bytes(c for c in range(256) if not chr(c).isalnum())
> >> 
> >> This takes a third of the time.
> > 
> > Good catch, but I found both of them are incorrect since str.isalnum() is
> > unicode aware on Python3. We'll need to use bytes.isalnum() or string.*
> > constants.
> 
> Oh, gross. I missed that. I think this patch fixes it, though not with the perf wins Martijn suggested:
> 
> diff --git a/mercurial/ui.py b/mercurial/ui.py
> --- a/mercurial/ui.py
> +++ b/mercurial/ui.py
> @@ -40,8 +40,8 @@ urlreq = util.urlreq
>  
>  # for use with str.translate(None, _keepalnum), to keep just alphanumerics
>  if pycompat.ispy3:
> -    _unicodes = [bytes([c]).decode('latin1') for c in range(256)]
> -    _notalnum = [s.encode('latin1') for s in _unicodes if not s.isalnum()]
> +    _bytes = [bytes([c]) for c in range(256)]
> +    _notalnum = [s for s in _bytes if not s.isalnum()]
>  else:
>      _notalnum = [c for c in map(chr, range(256)) if not c.isalnum()]
>  _keepalnum = ''.join(_notalnum)
> 
> Feel free to amend that into what’s already queued, or I can do a followup or resend as feels appropriate.

Applied this and rebased the other patches, thanks.

Patch

diff --git a/mercurial/ui.py b/mercurial/ui.py
--- a/mercurial/ui.py
+++ b/mercurial/ui.py
@@ -40,8 +40,8 @@  urlreq = util.urlreq
 
 # for use with str.translate(None, _keepalnum), to keep just alphanumerics
 if pycompat.ispy3:
-    _unicodes = [bytes([c]).decode('latin1') for c in range(256)]
-    _notalnum = [s.encode('latin1') for s in _unicodes if not s.isalnum()]
+    _bytes = [bytes([c]) for c in range(256)]
+    _notalnum = [s for s in _bytes if not s.isalnum()]
 else:
     _notalnum = [c for c in map(chr, range(256)) if not c.isalnum()]
 _keepalnum = ''.join(_notalnum)