Patchwork [2,of,3] revset: improve _descendants performance

login
register
mail settings
Submitter Durham Goode
Date April 1, 2014, 2:49 a.m.
Message ID <aa93084956fd536092f5.1396320591@dev2000.prn2.facebook.com>
Download mbox | patch
Permalink /patch/4175/
State Accepted
Commit 04e1596d5dbdc7cb14f16c1ed60c22d599cf24de
Headers show

Comments

Durham Goode - April 1, 2014, 2:49 a.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1395781801 25200
#      Tue Mar 25 14:10:01 2014 -0700
# Node ID aa93084956fd536092f56eb2347dcb723e153d94
# Parent  ab02d288a5968b12c2d56bc4d93291a6edf2af82
revset: improve _descendants performance

Previously revset._descendants would iterate over the entire subset (which is
often the entire repo) and test if each rev was in the descendants list. This is
really slow on large repos (3+ seconds).

Now we iterate over the descendants and test if they're in the subset.
This affects advancing and retracting the phase boundary (3.5 seconds down to
0.8 seconds, which is even faster than it was in 2.9). Also affects commands
that move the phase boundary (commit and rebase, presumably).

The new revsetbenchmark indicates an improvement from 0.2 to 0.12 seconds. So
future revset changes should be able to notice regressions.

I removed a bad test. It was recently added and tested '1:: and reverse(all())',
which has an amibiguous output direction.  Previously it printed in reverse order,
because we iterated over the subset (the reverse part). Now it prints in normal
order because we iterate over the 1:: . Since the revset itself doesn't imply an
order, I removed the test.

Patch

diff --git a/contrib/revsetbenchmarks.txt b/contrib/revsetbenchmarks.txt
--- a/contrib/revsetbenchmarks.txt
+++ b/contrib/revsetbenchmarks.txt
@@ -12,3 +12,4 @@ 
 min(0:tip)
 0::
 min(0::)
+roots((tip~100::) - (tip~100::tip))
diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -661,8 +661,18 @@ 
     if not args:
         return baseset([])
     s = _revdescendants(repo, args, followfirst)
-    a = set(args)
-    return subset.filter(lambda r: r in s or r in a)
+
+    # Both sets need to be ascending in order to lazily return the union
+    # in the correct order.
+    args.ascending()
+
+    subsetset = subset.set()
+    result = (orderedlazyset(s, subsetset.__contains__, ascending=True) +
+              orderedlazyset(args, subsetset.__contains__, ascending=True))
+
+    # Wrap result in a lazyset since it's an _addset, which doesn't implement
+    # all the necessary functions to be consumed by callers.
+    return orderedlazyset(result, lambda r: True, ascending=True)
 
 def descendants(repo, subset, x):
     """``descendants(set)``
diff --git a/tests/test-revset.t b/tests/test-revset.t
--- a/tests/test-revset.t
+++ b/tests/test-revset.t
@@ -438,16 +438,6 @@ 
   2
   1
   0
-  $ log '1:: and reverse(all())'
-  9
-  8
-  7
-  6
-  5
-  4
-  3
-  2
-  1
   $ log 'rev(5)'
   5
   $ log 'sort(limit(reverse(all()), 3))'