Patchwork [3,of,6] revset: do not try to sort revisions unnecessarily by addset._iterator()

login
register
mail settings
Submitter Yuya Nishihara
Date May 12, 2015, 10:59 p.m.
Message ID <cc7253d43150ef937aca.1431471577@mimosa>
Download mbox | patch
Permalink /patch/9032/
State Accepted
Headers show

Comments

Yuya Nishihara - May 12, 2015, 10:59 p.m.
# HG changeset patch
# User Yuya Nishihara <yuya@tcha.org>
# Date 1427716479 -32400
#      Mon Mar 30 20:54:39 2015 +0900
# Node ID cc7253d43150ef937aca205f2cb66a569378aeea
# Parent  e487cb80cc333a4d92376daa29c08fa703c5b249
revset: do not try to sort revisions unnecessarily by addset._iterator()

This fixes addset to remove duplicated revisions from right-hand-side set
even if one of the subset does not support fastasc/fastdesc.

addset._iterator() did not do the right thing if _ascending is set, because
addset._iterordered() expects that the given sets are sorted. This patch
removes the broken implementation. There's no need for addset._iterator()
to sort revisions because the result will be sorted by _trysetasclist()
as necessary.
Pierre-Yves David - May 13, 2015, 12:52 a.m.
On 05/12/2015 03:59 PM, Yuya Nishihara wrote:
> # HG changeset patch
> # User Yuya Nishihara <yuya@tcha.org>
> # Date 1427716479 -32400
> #      Mon Mar 30 20:54:39 2015 +0900
> # Node ID cc7253d43150ef937aca205f2cb66a569378aeea
> # Parent  e487cb80cc333a4d92376daa29c08fa703c5b249
> revset: do not try to sort revisions unnecessarily by addset._iterator()
>
> This fixes addset to remove duplicated revisions from right-hand-side set
> even if one of the subset does not support fastasc/fastdesc.
>
> addset._iterator() did not do the right thing if _ascending is set, because
> addset._iterordered() expects that the given sets are sorted. This patch
> removes the broken implementation. There's no need for addset._iterator()
> to sort revisions because the result will be sorted by _trysetasclist()
> as necessary.

I agree the current behavior is buggy. But I'm dubious about your the 
way to solve it.

The special casing of 'self._ascending' is a good correctness and 
optimisation purpose.

I see two potential issues here:

1) we use iterator from self._r1 and self._r2 without ensuring they are 
sorted in the order we need them to be sorted (or using appropriate 
directed iterator).

2) _iterordered is apparently not doing deduplication.

Fixing these two should solve the bug without performance and 
correctness impact?

As far as I understand your patch,  sorting the addset would result into 
a sorted output most case.

I also would like you to check performance impact when touching revset 
code. This means looking at the output from 
'contrib/revsetbenchmarks.py' with at least content of 
'contrib/revsetbenchmarks.t' (if the help of the script is too bad, 
point it out and we'll improve that).

On other topic unrelated topic (just look at the code):

- the _list is thingy is probably an unnecessary factorisation since it 
is very lightly used.
- I'm not sure there is value in using a baseset for _genlist instead of 
a list.
Yuya Nishihara - May 13, 2015, 2:39 p.m.
On Tue, 12 May 2015 17:52:30 -0700, Pierre-Yves David wrote:
> On 05/12/2015 03:59 PM, Yuya Nishihara wrote:
> > # HG changeset patch
> > # User Yuya Nishihara <yuya@tcha.org>
> > # Date 1427716479 -32400
> > #      Mon Mar 30 20:54:39 2015 +0900
> > # Node ID cc7253d43150ef937aca205f2cb66a569378aeea
> > # Parent  e487cb80cc333a4d92376daa29c08fa703c5b249
> > revset: do not try to sort revisions unnecessarily by addset._iterator()
> >
> > This fixes addset to remove duplicated revisions from right-hand-side set
> > even if one of the subset does not support fastasc/fastdesc.
> >
> > addset._iterator() did not do the right thing if _ascending is set, because
> > addset._iterordered() expects that the given sets are sorted. This patch
> > removes the broken implementation. There's no need for addset._iterator()
> > to sort revisions because the result will be sorted by _trysetasclist()
> > as necessary.
> 
> I agree the current behavior is buggy. But I'm dubious about your the 
> way to solve it.
> 
> The special casing of 'self._ascending' is a good correctness and 
> optimisation purpose.
> 
> I see two potential issues here:
> 
> 1) we use iterator from self._r1 and self._r2 without ensuring they are 
> sorted in the order we need them to be sorted (or using appropriate 
> directed iterator).
> 
> 2) _iterordered is apparently not doing deduplication.
> 
> Fixing these two should solve the bug without performance and 
> correctness impact?
> 
> As far as I understand your patch,  sorting the addset would result into 
> a sorted output most case.

_iterordered() can remove duplicates if both r1 and r2 are sorted. So we have
to fix (1).

But, when _iterator() is called with _ascending is not None (*1), we know at
least one of r1 or r2 does not support fastasc/fastdesc. So we'll have to
consume r1 or r2 anyway to get it sorted. I think the use of _iterordered() had
little benefit in this case.

 (*1) except when called from __len__(), which can be assumed to be slow

> I also would like you to check performance impact when touching revset 
> code. This means looking at the output from 
> 'contrib/revsetbenchmarks.py' with at least content of 
> 'contrib/revsetbenchmarks.t' (if the help of the script is too bad, 
> point it out and we'll improve that).

Hmm, I got notable difference by "roots((0:tip)::)". This seems to be caused
by the additional for-loop introduced by "revset: make addset class accepts
more than two subsets". Perhaps, I'll have to optimize addset.__contains__ and
_iterordered() for two subsets.

0) Revision: commit: no longer allow empty commit with the 'force' argument (API)
1) Revision: revset: do not nest addset by "addset + other" (issue4565)

revset #25: roots((0:tip)::)
0) wall 0.124226 comb 0.120000 user 0.120000 sys 0.000000 (best of 78)
1) wall 0.192682 comb 0.190000 user 0.190000 sys 0.000000 (best of 50)

Regards,

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -2974,10 +2974,10 @@  class addset(abstractsmartset):
     >>> [x for x in rs], [x for x in rs.fastasc()]  # without _asclist
     ([0, 2, 3, 4, 5], [0, 2, 3, 4, 5])
     >>> assert not rs._asclist
-    >>> len(rs)  # BROKEN
-    6
-    >>> [x for x in rs], [x for x in rs.fastasc()]  # BROKEN with _asclist
-    ([0, 2, 2, 3, 4, 5], [0, 2, 2, 3, 4, 5])
+    >>> len(rs)
+    5
+    >>> [x for x in rs], [x for x in rs.fastasc()]  # with _asclist
+    ([0, 2, 3, 4, 5], [0, 2, 3, 4, 5])
     >>> assert rs._asclist
 
     iterate descending:
@@ -2985,23 +2985,23 @@  class addset(abstractsmartset):
     >>> [x for x in rs], [x for x in rs.fastdesc()]  # without _asclist
     ([5, 4, 3, 2, 0], [5, 4, 3, 2, 0])
     >>> assert not rs._asclist
-    >>> len(rs)  # BROKEN
-    6
-    >>> [x for x in rs], [x for x in rs.fastdesc()]  # BROKEN with _asclist
-    ([5, 4, 3, 2, 2, 0], [5, 4, 3, 2, 2, 0])
+    >>> len(rs)
+    5
+    >>> [x for x in rs], [x for x in rs.fastdesc()]  # with _asclist
+    ([5, 4, 3, 2, 0], [5, 4, 3, 2, 0])
     >>> assert rs._asclist
 
     iterate ascending without fastasc:
     >>> rs = addset(xs, generatorset(ys), ascending=True)
     >>> assert rs.fastasc is None
-    >>> [x for x in rs]  # BROKEN
-    [0, 2, 2, 3, 4, 5]
+    >>> [x for x in rs]
+    [0, 2, 3, 4, 5]
 
     iterate descending without fastdesc:
     >>> rs = addset(generatorset(xs), ys, ascending=False)
     >>> assert rs.fastdesc is None
-    >>> [x for x in rs]  # BROKEN
-    [5, 4, 3, 2, 2, 0]
+    >>> [x for x in rs]
+    [5, 4, 3, 2, 0]
     """
     def __init__(self, revs1, revs2, ascending=None):
         self._r1 = revs1
@@ -3026,27 +3026,16 @@  class addset(abstractsmartset):
     def _iterator(self):
         """Iterate over both collections without repeating elements
 
-        If the ascending attribute is not set, iterate over the first one and
-        then over the second one checking for membership on the first one so we
-        dont yield any duplicates.
-
-        If the ascending attribute is set, iterate over both collections at the
-        same time, yielding only one value at a time in the given order.
+        No matter if the ascending attribute is set, iterate over the first one
+        and then over the second one checking for membership on the first one
+        so we don't yield any duplicates.
         """
-        if self._ascending is None:
-            def gen():
-                for r in self._r1:
-                    yield r
-                inr1 = self._r1.__contains__
-                for r in self._r2:
-                    if not inr1(r):
-                        yield r
-            gen = gen()
-        else:
-            iter1 = iter(self._r1)
-            iter2 = iter(self._r2)
-            gen = self._iterordered(self._ascending, iter1, iter2)
-        return gen
+        for r in self._r1:
+            yield r
+        inr1 = self._r1.__contains__
+        for r in self._r2:
+            if not inr1(r):
+                yield r
 
     def __iter__(self):
         if self._ascending is None: