Patchwork [3,of,3] revset: improve roots revset performance

login
register
mail settings
Submitter Durham Goode
Date April 1, 2014, 2:49 a.m.
Message ID <157d96db8eaf11d34301.1396320592@dev2000.prn2.facebook.com>
Download mbox | patch
Permalink /patch/4177/
State Accepted
Commit f52e4ca93529d876f36631583645b8fa34b5a8d2
Headers show

Comments

Durham Goode - April 1, 2014, 2:49 a.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1396307014 25200
#      Mon Mar 31 16:03:34 2014 -0700
# Node ID 157d96db8eaf11d343017477847ba8719d52eeca
# Parent  aa93084956fd536092f56eb2347dcb723e153d94
revset: improve roots revset performance

Previously we would iterate over every item in the subset, checking if it was in
the provided args. This often meant iterating over every rev in the repo.

Now we iterate over the args provided, checking if they exist in the subset.
On a large repo this brings setting phase boundaries (which use this revset
roots(X:: - X::Y)) down from 0.8 seconds to 0.4 seconds.

The "roots((tip~100::) - (tip~100::tip))" revset in revsetbenchmarks shows it
going from 0.12s to 0.10s, so we should be able to catch regressions here in the
future.

This actually introduces a regression in 'roots(all())' (0.2s to 0.26s) since
we're now using spansets, which are slightly slower to do containment checks on.
I believe this trade off is worth it, since it makes the revset O(number of
args) instead of O(size of repo).
Matt Mackall - April 2, 2014, 9:39 p.m.
On Mon, 2014-03-31 at 19:49 -0700, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1396307014 25200
> #      Mon Mar 31 16:03:34 2014 -0700
> # Node ID 157d96db8eaf11d343017477847ba8719d52eeca
> # Parent  aa93084956fd536092f56eb2347dcb723e153d94
> revset: improve roots revset performance

These are queued for default, thanks.

> This actually introduces a regression in 'roots(all())' (0.2s to 0.26s) since
> we're now using spansets, which are slightly slower to do containment checks on.
> I believe this trade off is worth it, since it makes the revset O(number of
> args) instead of O(size of repo).

I'm pretty sure there are now a bunch of new opportunities for
micro-optimization in here which can win back these losses. 

But yes, when the dust settles, we're going to have a bit of a mixed bag
of regressions vs improvements here since we've changed basically
everything. But I think on the whole, we're going to have a bunch of
large wins weighing against small losses.

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -1480,8 +1480,8 @@ 
     """``roots(set)``
     Changesets in set with no parent changeset in set.
     """
-    s = getset(repo, baseset(repo.changelog), x).set()
-    subset = baseset([r for r in subset if r in s])
+    s = getset(repo, spanset(repo), x).set()
+    subset = baseset([r for r in s if r in subset.set()])
     cs = _children(repo, subset, s)
     return subset - cs