Patchwork [5,of,6,py3] dirstate: use list comprehension to get a list of keys

login
register
mail settings
Submitter Pulkit Goyal
Date March 16, 2017, 4:13 a.m.
Message ID <b5673a08993652a92c0e.1489637600@pulkit-goyal>
Download mbox | patch
Permalink /patch/19378/
State Accepted
Headers show

Comments

Pulkit Goyal - March 16, 2017, 4:13 a.m.
# HG changeset patch
# User Pulkit Goyal <7895pulkit@gmail.com>
# Date 1489635027 -19800
#      Thu Mar 16 09:00:27 2017 +0530
# Node ID b5673a08993652a92c0e20a4e24d842194872454
# Parent  69aafa5d3e0f109c7b501758381f65fd9b375196
dirstate: use list comprehension to get a list of keys

We have used dict.keys() which returns a dict_keys() object instead
of list on Python 3. So this patch replaces that with list comprehension
which works both on Python 2 and 3.
via Mercurial-devel - March 16, 2017, 5 a.m.
On Wed, Mar 15, 2017 at 9:13 PM, Pulkit Goyal <7895pulkit@gmail.com> wrote:
> # HG changeset patch
> # User Pulkit Goyal <7895pulkit@gmail.com>
> # Date 1489635027 -19800
> #      Thu Mar 16 09:00:27 2017 +0530
> # Node ID b5673a08993652a92c0e20a4e24d842194872454
> # Parent  69aafa5d3e0f109c7b501758381f65fd9b375196
> dirstate: use list comprehension to get a list of keys
>
> We have used dict.keys() which returns a dict_keys() object instead
> of list on Python 3. So this patch replaces that with list comprehension
> which works both on Python 2 and 3.
>
> diff -r 69aafa5d3e0f -r b5673a089936 mercurial/dirstate.py
> --- a/mercurial/dirstate.py     Thu Mar 16 08:57:53 2017 +0530
> +++ b/mercurial/dirstate.py     Thu Mar 16 09:00:27 2017 +0530
> @@ -1079,7 +1079,7 @@
>              # a) not matching matchfn b) ignored, c) missing, or d) under a
>              # symlink directory.
>              if not results and matchalways:
> -                visit = dmap.keys()
> +                visit = [f for f in dmap]

Dirstate performance is critical. I don't know if this particular
place matters. Could you see if this makes a measurable difference on
a large repo like Mozilla's?

>              else:
>                  visit = [f for f in dmap if f not in results and matchfn(f)]
>              visit.sort()
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
via Mercurial-devel - March 16, 2017, 5:25 a.m.
On Wed, Mar 15, 2017 at 10:19 PM, Pulkit Goyal <7895pulkit@gmail.com> wrote:
>
>
> On Thu, Mar 16, 2017 at 10:30 AM, Martin von Zweigbergk
> <martinvonz@google.com> wrote:
>>
>> On Wed, Mar 15, 2017 at 9:13 PM, Pulkit Goyal <7895pulkit@gmail.com>
>> wrote:
>> > # HG changeset patch
>> > # User Pulkit Goyal <7895pulkit@gmail.com>
>> > # Date 1489635027 -19800
>> > #      Thu Mar 16 09:00:27 2017 +0530
>> > # Node ID b5673a08993652a92c0e20a4e24d842194872454
>> > # Parent  69aafa5d3e0f109c7b501758381f65fd9b375196
>> > dirstate: use list comprehension to get a list of keys
>> >
>> > We have used dict.keys() which returns a dict_keys() object instead
>> > of list on Python 3. So this patch replaces that with list comprehension
>> > which works both on Python 2 and 3.
>> >
>> > diff -r 69aafa5d3e0f -r b5673a089936 mercurial/dirstate.py
>> > --- a/mercurial/dirstate.py     Thu Mar 16 08:57:53 2017 +0530
>> > +++ b/mercurial/dirstate.py     Thu Mar 16 09:00:27 2017 +0530
>> > @@ -1079,7 +1079,7 @@
>> >              # a) not matching matchfn b) ignored, c) missing, or d)
>> > under a
>> >              # symlink directory.
>> >              if not results and matchalways:
>> > -                visit = dmap.keys()
>> > +                visit = [f for f in dmap]
>>
>> Dirstate performance is critical. I don't know if this particular
>> place matters. Could you see if this makes a measurable difference on
>> a large repo like Mozilla's?
>
>
> I didn't tried on Mozilla but I can see significance performance difference.

Where did you see that difference? Simply when running "hg status" in
the hg core repo?

> I will send a V2 with not touching the dirstate code.

Note that 1-3 are already queued.
via Mercurial-devel - March 16, 2017, 6:06 a.m.
On Wed, Mar 15, 2017 at 10:40 PM, Pulkit Goyal <7895pulkit@gmail.com> wrote:
>
>
> On Thu, Mar 16, 2017 at 10:55 AM, Martin von Zweigbergk
> <martinvonz@google.com> wrote:
>>
>> On Wed, Mar 15, 2017 at 10:19 PM, Pulkit Goyal <7895pulkit@gmail.com>
>> wrote:
>> >
>> >
>> > On Thu, Mar 16, 2017 at 10:30 AM, Martin von Zweigbergk
>> > <martinvonz@google.com> wrote:
>> >>
>> >> On Wed, Mar 15, 2017 at 9:13 PM, Pulkit Goyal <7895pulkit@gmail.com>
>> >> wrote:
>> >> > # HG changeset patch
>> >> > # User Pulkit Goyal <7895pulkit@gmail.com>
>> >> > # Date 1489635027 -19800
>> >> > #      Thu Mar 16 09:00:27 2017 +0530
>> >> > # Node ID b5673a08993652a92c0e20a4e24d842194872454
>> >> > # Parent  69aafa5d3e0f109c7b501758381f65fd9b375196
>> >> > dirstate: use list comprehension to get a list of keys
>> >> >
>> >> > We have used dict.keys() which returns a dict_keys() object instead
>> >> > of list on Python 3. So this patch replaces that with list
>> >> > comprehension
>> >> > which works both on Python 2 and 3.
>> >> >
>> >> > diff -r 69aafa5d3e0f -r b5673a089936 mercurial/dirstate.py
>> >> > --- a/mercurial/dirstate.py     Thu Mar 16 08:57:53 2017 +0530
>> >> > +++ b/mercurial/dirstate.py     Thu Mar 16 09:00:27 2017 +0530
>> >> > @@ -1079,7 +1079,7 @@
>> >> >              # a) not matching matchfn b) ignored, c) missing, or d)
>> >> > under a
>> >> >              # symlink directory.
>> >> >              if not results and matchalways:
>> >> > -                visit = dmap.keys()
>> >> > +                visit = [f for f in dmap]
>> >>
>> >> Dirstate performance is critical. I don't know if this particular
>> >> place matters. Could you see if this makes a measurable difference on
>> >> a large repo like Mozilla's?
>> >
>> >
>> > I didn't tried on Mozilla but I can see significance performance
>> > difference.
>>
>> Where did you see that difference? Simply when running "hg status" in
>> the hg core repo?
>
>
> I did the following and saw significant difference. This can be wrong as I
> don't know much about the factors taken into consideration while computing
> the time.
>
>>>> dic = {1:2, 3:4, 5:6, 7:8}
>
>>>> def a():
> ...     return dic.keys()
> ...
>
>>>> def b():
> ...     return [ls for ls in dic]
> ...
>
>>>> timeit.timeit(b, number=1000)
> 0.0006921291351318359
>
>>>> timeit.timeit(a, number=1000)
> 0.0004601478576660156

That is quite a significant difference. However, it may not be
significant in the context of any commands the user may run, and if
that's not the case, it doesn't matter if it got slower.

Actually, I ended up testing it myself on the Mozilla repo. It turns
out "hg status -a" causes that line to be run. I could not see any
difference at all in the time of "hg status -a" (stable at 450ms). The
repo had about 67k files at the revision I tested. So I'll queue this
patch too, thanks.

>
>>
>>
>> > I will send a V2 with not touching the dirstate code.
>>
>> Note that 1-3 are already queued.
>
>
> Thanks!
>
>

Patch

diff -r 69aafa5d3e0f -r b5673a089936 mercurial/dirstate.py
--- a/mercurial/dirstate.py	Thu Mar 16 08:57:53 2017 +0530
+++ b/mercurial/dirstate.py	Thu Mar 16 09:00:27 2017 +0530
@@ -1079,7 +1079,7 @@ 
             # a) not matching matchfn b) ignored, c) missing, or d) under a
             # symlink directory.
             if not results and matchalways:
-                visit = dmap.keys()
+                visit = [f for f in dmap]
             else:
                 visit = [f for f in dmap if f not in results and matchfn(f)]
             visit.sort()