Patchwork parsers: optimize filtered headrevs logic

login
register
mail settings
Submitter Durham Goode
Date March 8, 2016, 8:27 a.m.
Message ID <d01268a43486912fe902.1457425647@dev8486.prn1.facebook.com>
Download mbox | patch
Permalink /patch/13671/
State Accepted
Headers show

Comments

Durham Goode - March 8, 2016, 8:27 a.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1457425208 28800
#      Tue Mar 08 00:20:08 2016 -0800
# Node ID d01268a43486912fe90280019fb21e36522e5d49
# Parent  9974b8236cac50945d7b529e7c4fae9cf4974443
parsers: optimize filtered headrevs logic

The old native head revs logic would iterate over every node, starting from 0,
and check if every node was filtered (by testing it against the filteredrevs
python set). On large repos with hundreds of thousands of commits, this could
take 150ms.

This new logic iterates over the nodes in reverse order, and skips the filtered
check if we've seen an unfiltered child of the node. This saves approximately a
bagillion filteredrevs set checks, which shaves the time down from 150ms to
20ms during every branch cache write.
Augie Fackler - March 8, 2016, 3:11 p.m.
On Tue, Mar 08, 2016 at 12:27:27AM -0800, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1457425208 28800
> #      Tue Mar 08 00:20:08 2016 -0800
> # Node ID d01268a43486912fe90280019fb21e36522e5d49
> # Parent  9974b8236cac50945d7b529e7c4fae9cf4974443
> parsers: optimize filtered headrevs logic

Nice, queued.

>
> The old native head revs logic would iterate over every node, starting from 0,
> and check if every node was filtered (by testing it against the filteredrevs
> python set). On large repos with hundreds of thousands of commits, this could
> take 150ms.
>
> This new logic iterates over the nodes in reverse order, and skips the filtered
> check if we've seen an unfiltered child of the node. This saves approximately a

> bagillion

I laughed.


> filteredrevs set checks, which shaves the time down from 150ms to
> 20ms during every branch cache write.
>
> diff --git a/mercurial/parsers.c b/mercurial/parsers.c
> --- a/mercurial/parsers.c
> +++ b/mercurial/parsers.c
> @@ -1446,20 +1446,26 @@ static PyObject *index_headrevs(indexObj
>               goto bail;
>       }
>
> -	for (i = 0; i < len; i++) {
> +	for (i = len - 1; i >= 0; i--) {
>               int isfiltered;
>               int parents[2];
>
> -		isfiltered = check_filter(filter, i);
> -		if (isfiltered == -1) {
> -			PyErr_SetString(PyExc_TypeError,
> -				"unable to check filter");
> -			goto bail;
> -		}
> -
> -		if (isfiltered) {
> -			nothead[i] = 1;
> -			continue;
> +		/* If nothead[i] == 1, it means we've seen an unfiltered child of this
> +              * node already, and therefore this node is not filtered. So we can skip
> +              * the expensive check_filter step.
> +              */
> +		if (nothead[i] != 1) {
> +			isfiltered = check_filter(filter, i);
> +			if (isfiltered == -1) {
> +				PyErr_SetString(PyExc_TypeError,
> +					"unable to check filter");
> +				goto bail;
> +			}
> +
> +			if (isfiltered) {
> +				nothead[i] = 1;
> +				continue;
> +			}
>               }
>
>               if (index_get_parents(self, i, parents, (int)len - 1) < 0)
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Patch

diff --git a/mercurial/parsers.c b/mercurial/parsers.c
--- a/mercurial/parsers.c
+++ b/mercurial/parsers.c
@@ -1446,20 +1446,26 @@  static PyObject *index_headrevs(indexObj
 		goto bail;
 	}
 
-	for (i = 0; i < len; i++) {
+	for (i = len - 1; i >= 0; i--) {
 		int isfiltered;
 		int parents[2];
 
-		isfiltered = check_filter(filter, i);
-		if (isfiltered == -1) {
-			PyErr_SetString(PyExc_TypeError,
-				"unable to check filter");
-			goto bail;
-		}
-
-		if (isfiltered) {
-			nothead[i] = 1;
-			continue;
+		/* If nothead[i] == 1, it means we've seen an unfiltered child of this
+		 * node already, and therefore this node is not filtered. So we can skip
+		 * the expensive check_filter step.
+		 */
+		if (nothead[i] != 1) {
+			isfiltered = check_filter(filter, i);
+			if (isfiltered == -1) {
+				PyErr_SetString(PyExc_TypeError,
+					"unable to check filter");
+				goto bail;
+			}
+
+			if (isfiltered) {
+				nothead[i] = 1;
+				continue;
+			}
 		}
 
 		if (index_get_parents(self, i, parents, (int)len - 1) < 0)