Patchwork parsers: when available, use a presized dictionary for the file foldmap

login
register
mail settings
Submitter Siddharth Agarwal
Date April 15, 2015, 9:38 p.m.
Message ID <8ca386e305d52af5e714.1429133932@devbig136.prn2.facebook.com>
Download mbox | patch
Permalink /patch/8694/
State Accepted
Headers show

Comments

Siddharth Agarwal - April 15, 2015, 9:38 p.m.
# HG changeset patch
# User Siddharth Agarwal <sid0@fb.com>
# Date 1429133744 25200
#      Wed Apr 15 14:35:44 2015 -0700
# Node ID 8ca386e305d52af5e714126173bc15c884bf16fc
# Parent  c560d8c687916cb70a6d54c2c9ddcb5c9e457be2
parsers: when available, use a presized dictionary for the file foldmap

On a repo with over 300,000 files, this speeds up perffilefoldmap:

before: wall 0.178421 comb 0.180000 user 0.160000 sys 0.020000 (best of 55)
after:  wall 0.164462 comb 0.160000 user 0.140000 sys 0.020000 (best of 59)
Pierre-Yves David - April 15, 2015, 10:07 p.m.
On 04/15/2015 05:38 PM, Siddharth Agarwal wrote:
> # HG changeset patch
> # User Siddharth Agarwal<sid0@fb.com>
> # Date 1429133744 25200
> #      Wed Apr 15 14:35:44 2015 -0700
> # Node ID 8ca386e305d52af5e714126173bc15c884bf16fc
> # Parent  c560d8c687916cb70a6d54c2c9ddcb5c9e457be2
> parsers: when available, use a presized dictionary for the file foldmap

This one is pushed to the clowncopter. Nice catch.
Augie Fackler - April 15, 2015, 10:35 p.m.
On Wed, Apr 15, 2015 at 02:38:52PM -0700, Siddharth Agarwal wrote:
> # HG changeset patch
> # User Siddharth Agarwal <sid0@fb.com>
> # Date 1429133744 25200
> #      Wed Apr 15 14:35:44 2015 -0700
> # Node ID 8ca386e305d52af5e714126173bc15c884bf16fc
> # Parent  c560d8c687916cb70a6d54c2c9ddcb5c9e457be2
> parsers: when available, use a presized dictionary for the file foldmap

It might be interesting to expose a parsers.presizeddict() method to
our Python code as well.

Queued this, thanks.

>
> On a repo with over 300,000 files, this speeds up perffilefoldmap:
>
> before: wall 0.178421 comb 0.180000 user 0.160000 sys 0.020000 (best of 55)
> after:  wall 0.164462 comb 0.160000 user 0.140000 sys 0.020000 (best of 59)
>
> diff --git a/mercurial/parsers.c b/mercurial/parsers.c
> --- a/mercurial/parsers.c
> +++ b/mercurial/parsers.c
> @@ -205,7 +205,20 @@ static PyObject *make_file_foldmap(PyObj
>               goto quit;
>       }
>
> +#if PY_VERSION_HEX >= 0x02060000
> +	/* _PyDict_NewPresized expects a minused parameter, but it actually
> +        creates a dictionary that's the nearest power of two bigger than the
> +        parameter. For example, with the initial minused = 1000, the
> +        dictionary created has size 1024. Of course in a lot of cases that
> +        can be greater than the maximum load factor Python's dict object
> +        expects (= 2/3), so as soon as we cross the threshold we'll resize
> +        anyway. So create a dictionary that's 3/2 the size. Also add some
> +        more to deal with additions outside this function. */
> +	file_foldmap = _PyDict_NewPresized((PyDict_Size(dmap) / 5) * 8);
> +#else
>       file_foldmap = PyDict_New();
> +#endif
> +
>       if (file_foldmap == NULL)
>               goto quit;
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
Siddharth Agarwal - April 16, 2015, 7:47 a.m.
On 04/15/2015 06:35 PM, Augie Fackler wrote:
> On Wed, Apr 15, 2015 at 02:38:52PM -0700, Siddharth Agarwal wrote:
>> # HG changeset patch
>> # User Siddharth Agarwal <sid0@fb.com>
>> # Date 1429133744 25200
>> #      Wed Apr 15 14:35:44 2015 -0700
>> # Node ID 8ca386e305d52af5e714126173bc15c884bf16fc
>> # Parent  c560d8c687916cb70a6d54c2c9ddcb5c9e457be2
>> parsers: when available, use a presized dictionary for the file foldmap
> It might be interesting to expose a parsers.presizeddict() method to
> our Python code as well.

Probably easiest to do that the moment we drop < 2.6 support.

>
> Queued this, thanks.
>
>> On a repo with over 300,000 files, this speeds up perffilefoldmap:
>>
>> before: wall 0.178421 comb 0.180000 user 0.160000 sys 0.020000 (best of 55)
>> after:  wall 0.164462 comb 0.160000 user 0.140000 sys 0.020000 (best of 59)
>>
>> diff --git a/mercurial/parsers.c b/mercurial/parsers.c
>> --- a/mercurial/parsers.c
>> +++ b/mercurial/parsers.c
>> @@ -205,7 +205,20 @@ static PyObject *make_file_foldmap(PyObj
>>               goto quit;
>>       }
>>
>> +#if PY_VERSION_HEX >= 0x02060000
>> +	/* _PyDict_NewPresized expects a minused parameter, but it actually
>> +        creates a dictionary that's the nearest power of two bigger than the
>> +        parameter. For example, with the initial minused = 1000, the
>> +        dictionary created has size 1024. Of course in a lot of cases that
>> +        can be greater than the maximum load factor Python's dict object
>> +        expects (= 2/3), so as soon as we cross the threshold we'll resize
>> +        anyway. So create a dictionary that's 3/2 the size. Also add some
>> +        more to deal with additions outside this function. */
>> +	file_foldmap = _PyDict_NewPresized((PyDict_Size(dmap) / 5) * 8);
>> +#else
>>       file_foldmap = PyDict_New();
>> +#endif
>> +
>>       if (file_foldmap == NULL)
>>               goto quit;
>>
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel@selenic.com
>> http://selenic.com/mailman/listinfo/mercurial-devel
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel

Patch

diff --git a/mercurial/parsers.c b/mercurial/parsers.c
--- a/mercurial/parsers.c
+++ b/mercurial/parsers.c
@@ -205,7 +205,20 @@  static PyObject *make_file_foldmap(PyObj
 		goto quit;
 	}
 
+#if PY_VERSION_HEX >= 0x02060000
+	/* _PyDict_NewPresized expects a minused parameter, but it actually
+	   creates a dictionary that's the nearest power of two bigger than the
+	   parameter. For example, with the initial minused = 1000, the
+	   dictionary created has size 1024. Of course in a lot of cases that
+	   can be greater than the maximum load factor Python's dict object
+	   expects (= 2/3), so as soon as we cross the threshold we'll resize
+	   anyway. So create a dictionary that's 3/2 the size. Also add some
+	   more to deal with additions outside this function. */
+	file_foldmap = _PyDict_NewPresized((PyDict_Size(dmap) / 5) * 8);
+#else
 	file_foldmap = PyDict_New();
+#endif
+
 	if (file_foldmap == NULL)
 		goto quit;