Patchwork OS X: try cheap ascii .lower() in normcase before making full unicode dance

login
register
mail settings
Submitter Mads Kiilerich
Date Jan. 29, 2013, 4:42 p.m.
Message ID <b7e3f9a8796fc31b6d81.1359477759@mk-desktop>
Download mbox | patch
Permalink /patch/755/
State Accepted
Commit a3b2dc1aa909e22b1c16cc0e46d17d04105e0e15
Headers show

Comments

Mads Kiilerich - Jan. 29, 2013, 4:42 p.m.
# HG changeset patch
# User Mads Kiilerich <madski@unity3d.com>
# Date 1359475301 -3600
# Branch stable
# Node ID b7e3f9a8796fc31b6d81d134072b4daa3e0d42ec
# Parent  a58d8936647aa270854cd919fe8e8b2da1c1c669
OS X: try cheap ascii .lower() in normcase before making full unicode dance

This is similar to what is done in encoding.lower, introduced in c481761033bd.

This has been seen making 'hg up' and 'hg st' in a 50000+ files repo 13%
faster.

This might make Mercurial slightly slower for users who mainly use non-ASCII
filenames. That is a reasonable trade-off.

Some numbers:

hg up
before:
time: real 2.100 secs (user 1.900+0.000 sys 0.200+0.000)
time: real 2.110 secs (user 1.930+0.000 sys 0.180+0.000)
time: real 2.080 secs (user 1.900+0.000 sys 0.180+0.000)
after:
time: real 1.830 secs (user 1.660+0.000 sys 0.180+0.000)
time: real 1.840 secs (user 1.660+0.000 sys 0.180+0.000)
time: real 1.810 secs (user 1.640+0.000 sys 0.180+0.000)

hg st
before:
time: real 1.680 secs (user 1.370+0.000 sys 0.310+0.000)
time: real 1.650 secs (user 1.340+0.000 sys 0.300+0.000)
time: real 1.660 secs (user 1.350+0.000 sys 0.310+0.000)
after:
time: real 1.460 secs (user 1.150+0.000 sys 0.310+0.000)
time: real 1.430 secs (user 1.120+0.000 sys 0.300+0.000)
time: real 1.420 secs (user 1.120+0.000 sys 0.300+0.000)
Mads Kiilerich - Jan. 29, 2013, 7:09 p.m.
On 01/29/2013 06:23 PM, Kevin Bullock wrote:
> On Jan 29, 2013, at 10:42 AM, Mads Kiilerich wrote:
>
>> # HG changeset patch
>> # User Mads Kiilerich <madski@unity3d.com>
>> # Date 1359475301 -3600
>> # Branch stable
>> # Node ID b7e3f9a8796fc31b6d81d134072b4daa3e0d42ec
>> # Parent  a58d8936647aa270854cd919fe8e8b2da1c1c669
>> OS X: try cheap ascii .lower() in normcase before making full unicode dance
> LGTM

crewed

/Mads
Matt Mackall - Jan. 29, 2013, 10:41 p.m.
On Tue, 2013-01-29 at 20:09 +0100, Mads Kiilerich wrote:
> On 01/29/2013 06:23 PM, Kevin Bullock wrote:
> > On Jan 29, 2013, at 10:42 AM, Mads Kiilerich wrote:
> >
> >> # HG changeset patch
> >> # User Mads Kiilerich <madski@unity3d.com>
> >> # Date 1359475301 -3600
> >> # Branch stable
> >> # Node ID b7e3f9a8796fc31b6d81d134072b4daa3e0d42ec
> >> # Parent  a58d8936647aa270854cd919fe8e8b2da1c1c669
> >> OS X: try cheap ascii .lower() in normcase before making full unicode dance
> > LGTM
> 
> crewed

Surprised you both think this is stable-appropriate.
Matt Mackall - Jan. 29, 2013, 11:13 p.m.
On Tue, 2013-01-29 at 16:53 -0600, Kevin Bullock wrote:
> On Jan 29, 2013, at 4:41 PM, Matt Mackall wrote:
> 
> > On Tue, 2013-01-29 at 20:09 +0100, Mads Kiilerich wrote:
> >> On 01/29/2013 06:23 PM, Kevin Bullock wrote:
> >>> On Jan 29, 2013, at 10:42 AM, Mads Kiilerich wrote:
> >>> 
> >>>> # HG changeset patch
> >>>> # User Mads Kiilerich <madski@unity3d.com>
> >>>> # Date 1359475301 -3600
> >>>> # Branch stable
> >>>> # Node ID b7e3f9a8796fc31b6d81d134072b4daa3e0d42ec
> >>>> # Parent  a58d8936647aa270854cd919fe8e8b2da1c1c669
> >>>> OS X: try cheap ascii .lower() in normcase before making full unicode dance
> >>> LGTM
> >> 
> >> crewed
> > 
> > Surprised you both think this is stable-appropriate.
> 
> I'm new here ;)
> 
> More seriously, Mads mentioned that it fixed a performance regression introduced with a previous bugfix (FWIW).

Yep. A performance regression does not automatically rise to the level
of "regression" or even "stable-worthy bug", despite having "regression"
in the name! By their nature, they are on a continuous priority spectrum
from "don't care" to "critical".

http://markmail.org/message/hlwxpm63vh32tnkj
Mads Kiilerich - Jan. 29, 2013, 11:19 p.m.
On 01/29/2013 11:41 PM, Matt Mackall wrote:
> On Tue, 2013-01-29 at 20:09 +0100, Mads Kiilerich wrote:
>> On 01/29/2013 06:23 PM, Kevin Bullock wrote:
>>> On Jan 29, 2013, at 10:42 AM, Mads Kiilerich wrote:
>>>
>>>> # HG changeset patch
>>>> # User Mads Kiilerich <madski@unity3d.com>
>>>> # Date 1359475301 -3600
>>>> # Branch stable
>>>> # Node ID b7e3f9a8796fc31b6d81d134072b4daa3e0d42ec
>>>> # Parent  a58d8936647aa270854cd919fe8e8b2da1c1c669
>>>> OS X: try cheap ascii .lower() in normcase before making full unicode dance
>>> LGTM
>> crewed
> Surprised you both think this is stable-appropriate.

That might have been a misjudgement.

It fixed a performance regression back from 1fa41d1f1351, made in stable 
and released in 2.0.1. (It traded some performance for correctness - 
which is fine ... but better performance and same correctness should 
also be fine).

It also uses a pattern already used in encoding.py, so I consider it 
quite low risk.

Admittedly not urgent, but I considered it safe and sufficiently nice to 
have.

If you disagree then we can back it out.

/Mads
Matt Mackall - Jan. 29, 2013, 11:53 p.m.
On Wed, 2013-01-30 at 00:19 +0100, Mads Kiilerich wrote:
> On 01/29/2013 11:41 PM, Matt Mackall wrote:
> > On Tue, 2013-01-29 at 20:09 +0100, Mads Kiilerich wrote:
> >> On 01/29/2013 06:23 PM, Kevin Bullock wrote:
> >>> On Jan 29, 2013, at 10:42 AM, Mads Kiilerich wrote:
> >>>
> >>>> # HG changeset patch
> >>>> # User Mads Kiilerich <madski@unity3d.com>
> >>>> # Date 1359475301 -3600
> >>>> # Branch stable
> >>>> # Node ID b7e3f9a8796fc31b6d81d134072b4daa3e0d42ec
> >>>> # Parent  a58d8936647aa270854cd919fe8e8b2da1c1c669
> >>>> OS X: try cheap ascii .lower() in normcase before making full unicode dance
> >>> LGTM
> >> crewed
> > Surprised you both think this is stable-appropriate.
> 
> That might have been a misjudgement.
> 
> It fixed a performance regression back from 1fa41d1f1351, made in stable 
> and released in 2.0.1. (It traded some performance for correctness - 
> which is fine ... but better performance and same correctness should 
> also be fine).
> 
> It also uses a pattern already used in encoding.py, so I consider it 
> quite low risk.

It is indeed practically zero risk, I absolutely agree. But stable is
less about managed risk (which people are notoriously bad at estimating)
than managed code churn (which is quite a bit easier to get a handle
on). This is why I also don't want to see code cleanups in stable.

> If you disagree then we can back it out.

We can leave it, but in the future I'd like folks to err towards less
churn.
Bryan O'Sullivan - Jan. 30, 2013, 5:40 p.m.
On Tue, Jan 29, 2013 at 3:13 PM, Matt Mackall <mpm@selenic.com> wrote:

> Yep. A performance regression does not automatically rise to the level
> of "regression" or even "stable-worthy bug", despite having "regression"
> in the name! By their nature, they are on a continuous priority spectrum
> from "don't care" to "critical".
>

I introduced a "perfregression" keyword to Bugzilla yesterday, so that we
can keep the two distinct.

Patch

diff --git a/mercurial/posix.py b/mercurial/posix.py
--- a/mercurial/posix.py
+++ b/mercurial/posix.py
@@ -195,6 +195,11 @@ 
 
     def normcase(path):
         try:
+            path.decode('ascii') # throw exception for non-ASCII character
+            return path.lower()
+        except UnicodeDecodeError:
+            pass
+        try:
             u = path.decode('utf-8')
         except UnicodeDecodeError:
             # percent-encode any characters that don't round-trip