Patchwork [STABLE,v2] revset: optimize baseset.__sub__ (issue4313)

login
register
mail settings
Submitter Gregory Szorc
Date July 24, 2014, 7:34 p.m.
Message ID <105ecd567b0bf59b1da3.1406230441@77.1.168.192.in-addr.arpa>
Download mbox | patch
Permalink /patch/5196/
State Accepted
Commit f486001f9d6fc7dd161016eedfa68a42b7a2840a
Headers show

Comments

Gregory Szorc - July 24, 2014, 7:34 p.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1406229132 25200
#      Thu Jul 24 12:12:12 2014 -0700
# Branch stable
# Node ID 105ecd567b0bf59b1da35124a6c7765793ac3962
# Parent  54ff2789d75e0662a536b3da61ca25c07436966b
revset: optimize baseset.__sub__ (issue4313)

dd716807fd23 regressed performance of baseset.__sub__ by introducing
a lazyset. This patch restores that lost performance by eagerly
evaluating baseset.__sub__ if the other set is a baseset.

revsetbenchmark.py results impacted by this change:

revset #6: roots(0::tip)
0) wall 2.923473 comb 2.920000 user 2.920000 sys 0.000000 (best of 4)
1) wall 0.077614 comb 0.080000 user 0.080000 sys 0.000000 (best of 100)

revset #23: roots((0:tip)::)
0) wall 2.875178 comb 2.880000 user 2.880000 sys 0.000000 (best of 4)
1) wall 0.154519 comb 0.150000 user 0.150000 sys 0.000000 (best of 61)

On the author's machine, this slowdown manifested during evaluation of
'roots(%ln::)' in phases.retractboundary after unbundling the Firefox
repository. Using `time hg unbundle firefox.hg` as a benchmark:

Before: 8:00
After:  4:28
Delta: -3:32

For reference, the subset and cs baseset instances impacted by this
change were of lengths 193634 and 193627, respectively.

Explicit test coverage of roots(%ln::), while similar to the existing
roots(0::tip) benchmark, has been added.
Matt Mackall - July 25, 2014, 1:49 p.m.
On Thu, 2014-07-24 at 12:34 -0700, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1406229132 25200
> #      Thu Jul 24 12:12:12 2014 -0700
> # Branch stable
> # Node ID 105ecd567b0bf59b1da35124a6c7765793ac3962
> # Parent  54ff2789d75e0662a536b3da61ca25c07436966b
> revset: optimize baseset.__sub__ (issue4313)

Queued for stable, thanks.

Patch

diff --git a/contrib/revsetbenchmarks.txt b/contrib/revsetbenchmarks.txt
--- a/contrib/revsetbenchmarks.txt
+++ b/contrib/revsetbenchmarks.txt
@@ -20,4 +20,5 @@  public()
 :10000 and public()
 draft()
 :10000 and draft()
 max(::(tip~20) - obsolete())
+roots((0:tip)::)
diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -2231,8 +2231,15 @@  class baseset(list):
     def __sub__(self, other):
         """Returns a new object with the substraction of the two collections.
 
         This is part of the mandatory API for smartset."""
+        # If we are operating on 2 baseset, do the computation now since all
+        # data is available. The alternative is to involve a lazyset, which
+        # may be slow.
+        if isinstance(other, baseset):
+            other = other.set()
+            return baseset([x for x in self if x not in other])
+
         return self.filter(lambda x: x not in other)
 
     def __and__(self, other):
         """Returns a new object with the intersection of the two collections.