Patchwork [3,of,5,STABLE,v2] revset: more optimally handle __sub__ (issue4352)

login
register
mail settings
Submitter Gregory Szorc
Date Sept. 8, 2014, 9:59 p.m.
Message ID <b2a51ed00298036887b5.1410213585@gps-mbp.local>
Download mbox | patch
Permalink /patch/5729/
State Rejected
Headers show

Comments

Gregory Szorc - Sept. 8, 2014, 9:59 p.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1410195265 25200
#      Mon Sep 08 09:54:25 2014 -0700
# Node ID b2a51ed00298036887b54108e3c7faa10a2d8417
# Parent  3e8808bd267f9c2f6a650247a3a4ba926ee6c624
revset: more optimally handle __sub__ (issue4352)

This patch introduces the _diffset class. This class is used to more
optimally compute set differences (__sub__).

The initial implementation in this patch is not yet fully optimal.
If the set on the right hand side of the "-" operator is a lazy set,
it will be evaluated fully before results are returned. In the
future, this set should be iterated much like _addset does it.

The new class is not used anywhere yet. Upcoming patches will change
that.
Pierre-Yves David - Sept. 17, 2014, 3:42 a.m.
On 09/08/2014 02:59 PM, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1410195265 25200
> #      Mon Sep 08 09:54:25 2014 -0700
> # Node ID b2a51ed00298036887b54108e3c7faa10a2d8417
> # Parent  3e8808bd267f9c2f6a650247a3a4ba926ee6c624
> revset: more optimally handle __sub__ (issue4352)

Ok so the regression here (before this patches) is that the substraction 
X - Y is done by doing a Y membership testing for all element in the X 
set. If the membership testing is expensive to do, this slow down 
significantly compared to doing the whole computation and using set 
membership testing.

Before lazy revset we were doing the full computation and set membership 
test. This changeset reinstate this in the substraction case.

This is good for set that are cheap to compute but for more expensive 
revset (like author), the membership testing on a few element may be 
more effective.

The improvement added in another patches is doing full traversal of the 
X too.

So I'm not sure this is the right fix and will poke at it a bit.

We need to find an approriate fix on stable for the mozilla (and other 
gigantic repo) people.
Pierre-Yves David - Sept. 18, 2014, 5:26 a.m.
On 09/16/2014 08:42 PM, Pierre-Yves David wrote:
>
>
> On 09/08/2014 02:59 PM, Gregory Szorc wrote:
>> # HG changeset patch
>> # User Gregory Szorc <gregory.szorc@gmail.com>
>> # Date 1410195265 25200
>> #      Mon Sep 08 09:54:25 2014 -0700
>> # Node ID b2a51ed00298036887b54108e3c7faa10a2d8417
>> # Parent  3e8808bd267f9c2f6a650247a3a4ba926ee6c624
>> revset: more optimally handle __sub__ (issue4352)
>
> Ok so the regression here (before this patches) is that the substraction
> X - Y is done by doing a Y membership testing for all element in the X
> set. If the membership testing is expensive to do, this slow down
> significantly compared to doing the whole computation and using set
> membership testing.
>
> Before lazy revset we were doing the full computation and set membership
> test. This changeset reinstate this in the substraction case.
>
> This is good for set that are cheap to compute but for more expensive
> revset (like author), the membership testing on a few element may be
> more effective.
>
> The improvement added in another patches is doing full traversal of the
> X too.

My bad, the revset engine turns "X - Y" into "X - (X & Y)" (and this 
change is part of what make the whole thing slow).

However, the current thing is slow because lazy revset build a gigantic 
pile a crap and that element in the pile of crap itself are slow. In my 
opinion, the first move here is to make this pile of crap less pily and 
less crapy. I started playing in that direction and I got interesting 
results.

revset #0: roots((0::) - (0::tip))
0) wall 2.060527 comb 2.060000 user 2.050000 sys 0.010000 (best of 5)
1) wall 0.156491 comb 0.160000 user 0.160000 sys 0.000000 (best of 59)

I just sent another patch to do a quickfix on stable regarding the phase 
movement issue.

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -2625,8 +2625,32 @@  class _addset(_lazysetiteratormixin):
 
     def __contains__(self, x):
         return x in self._r1 or x in self._r2
 
+class _diffset(_lazysetiteratormixin):
+    """Represents the difference between two sets.
+
+    This class is similar to _addset. It exists to optimally compute the
+    difference between two sets. Its impact is most felt when operating
+    on two lazy sets.
+    """
+    def __contains__(self, x):
+        return x in self._list
+
+    def _iterator(self):
+        if self._iter is None:
+            def gen():
+                # We could further optimize this if r2 is ordered. See
+                # _addset._iterator for inspiration.
+                s = self._r2.set()
+                for r in self._r1:
+                    if r not in s:
+                        yield r
+
+            self._iter = _generatorset(gen())
+
+        return self._iter
+
 class _generatorset(object):
     """Wrap a generator for lazy iteration
 
     Wrapper structure for generators that provides lazy membership and can