Submitter | Yuya Nishihara |
---|---|
Date | May 12, 2015, 10:59 p.m. |
Message ID | <cc7253d43150ef937aca.1431471577@mimosa> |
Download | mbox | patch |
Permalink | /patch/9032/ |
State | Accepted |
Headers | show |
Comments
On 05/12/2015 03:59 PM, Yuya Nishihara wrote: > # HG changeset patch > # User Yuya Nishihara <yuya@tcha.org> > # Date 1427716479 -32400 > # Mon Mar 30 20:54:39 2015 +0900 > # Node ID cc7253d43150ef937aca205f2cb66a569378aeea > # Parent e487cb80cc333a4d92376daa29c08fa703c5b249 > revset: do not try to sort revisions unnecessarily by addset._iterator() > > This fixes addset to remove duplicated revisions from right-hand-side set > even if one of the subset does not support fastasc/fastdesc. > > addset._iterator() did not do the right thing if _ascending is set, because > addset._iterordered() expects that the given sets are sorted. This patch > removes the broken implementation. There's no need for addset._iterator() > to sort revisions because the result will be sorted by _trysetasclist() > as necessary. I agree the current behavior is buggy. But I'm dubious about your the way to solve it. The special casing of 'self._ascending' is a good correctness and optimisation purpose. I see two potential issues here: 1) we use iterator from self._r1 and self._r2 without ensuring they are sorted in the order we need them to be sorted (or using appropriate directed iterator). 2) _iterordered is apparently not doing deduplication. Fixing these two should solve the bug without performance and correctness impact? As far as I understand your patch, sorting the addset would result into a sorted output most case. I also would like you to check performance impact when touching revset code. This means looking at the output from 'contrib/revsetbenchmarks.py' with at least content of 'contrib/revsetbenchmarks.t' (if the help of the script is too bad, point it out and we'll improve that). On other topic unrelated topic (just look at the code): - the _list is thingy is probably an unnecessary factorisation since it is very lightly used. - I'm not sure there is value in using a baseset for _genlist instead of a list.
On Tue, 12 May 2015 17:52:30 -0700, Pierre-Yves David wrote: > On 05/12/2015 03:59 PM, Yuya Nishihara wrote: > > # HG changeset patch > > # User Yuya Nishihara <yuya@tcha.org> > > # Date 1427716479 -32400 > > # Mon Mar 30 20:54:39 2015 +0900 > > # Node ID cc7253d43150ef937aca205f2cb66a569378aeea > > # Parent e487cb80cc333a4d92376daa29c08fa703c5b249 > > revset: do not try to sort revisions unnecessarily by addset._iterator() > > > > This fixes addset to remove duplicated revisions from right-hand-side set > > even if one of the subset does not support fastasc/fastdesc. > > > > addset._iterator() did not do the right thing if _ascending is set, because > > addset._iterordered() expects that the given sets are sorted. This patch > > removes the broken implementation. There's no need for addset._iterator() > > to sort revisions because the result will be sorted by _trysetasclist() > > as necessary. > > I agree the current behavior is buggy. But I'm dubious about your the > way to solve it. > > The special casing of 'self._ascending' is a good correctness and > optimisation purpose. > > I see two potential issues here: > > 1) we use iterator from self._r1 and self._r2 without ensuring they are > sorted in the order we need them to be sorted (or using appropriate > directed iterator). > > 2) _iterordered is apparently not doing deduplication. > > Fixing these two should solve the bug without performance and > correctness impact? > > As far as I understand your patch, sorting the addset would result into > a sorted output most case. _iterordered() can remove duplicates if both r1 and r2 are sorted. So we have to fix (1). But, when _iterator() is called with _ascending is not None (*1), we know at least one of r1 or r2 does not support fastasc/fastdesc. So we'll have to consume r1 or r2 anyway to get it sorted. I think the use of _iterordered() had little benefit in this case. (*1) except when called from __len__(), which can be assumed to be slow > I also would like you to check performance impact when touching revset > code. This means looking at the output from > 'contrib/revsetbenchmarks.py' with at least content of > 'contrib/revsetbenchmarks.t' (if the help of the script is too bad, > point it out and we'll improve that). Hmm, I got notable difference by "roots((0:tip)::)". This seems to be caused by the additional for-loop introduced by "revset: make addset class accepts more than two subsets". Perhaps, I'll have to optimize addset.__contains__ and _iterordered() for two subsets. 0) Revision: commit: no longer allow empty commit with the 'force' argument (API) 1) Revision: revset: do not nest addset by "addset + other" (issue4565) revset #25: roots((0:tip)::) 0) wall 0.124226 comb 0.120000 user 0.120000 sys 0.000000 (best of 78) 1) wall 0.192682 comb 0.190000 user 0.190000 sys 0.000000 (best of 50) Regards,
Patch
diff --git a/mercurial/revset.py b/mercurial/revset.py --- a/mercurial/revset.py +++ b/mercurial/revset.py @@ -2974,10 +2974,10 @@ class addset(abstractsmartset): >>> [x for x in rs], [x for x in rs.fastasc()] # without _asclist ([0, 2, 3, 4, 5], [0, 2, 3, 4, 5]) >>> assert not rs._asclist - >>> len(rs) # BROKEN - 6 - >>> [x for x in rs], [x for x in rs.fastasc()] # BROKEN with _asclist - ([0, 2, 2, 3, 4, 5], [0, 2, 2, 3, 4, 5]) + >>> len(rs) + 5 + >>> [x for x in rs], [x for x in rs.fastasc()] # with _asclist + ([0, 2, 3, 4, 5], [0, 2, 3, 4, 5]) >>> assert rs._asclist iterate descending: @@ -2985,23 +2985,23 @@ class addset(abstractsmartset): >>> [x for x in rs], [x for x in rs.fastdesc()] # without _asclist ([5, 4, 3, 2, 0], [5, 4, 3, 2, 0]) >>> assert not rs._asclist - >>> len(rs) # BROKEN - 6 - >>> [x for x in rs], [x for x in rs.fastdesc()] # BROKEN with _asclist - ([5, 4, 3, 2, 2, 0], [5, 4, 3, 2, 2, 0]) + >>> len(rs) + 5 + >>> [x for x in rs], [x for x in rs.fastdesc()] # with _asclist + ([5, 4, 3, 2, 0], [5, 4, 3, 2, 0]) >>> assert rs._asclist iterate ascending without fastasc: >>> rs = addset(xs, generatorset(ys), ascending=True) >>> assert rs.fastasc is None - >>> [x for x in rs] # BROKEN - [0, 2, 2, 3, 4, 5] + >>> [x for x in rs] + [0, 2, 3, 4, 5] iterate descending without fastdesc: >>> rs = addset(generatorset(xs), ys, ascending=False) >>> assert rs.fastdesc is None - >>> [x for x in rs] # BROKEN - [5, 4, 3, 2, 2, 0] + >>> [x for x in rs] + [5, 4, 3, 2, 0] """ def __init__(self, revs1, revs2, ascending=None): self._r1 = revs1 @@ -3026,27 +3026,16 @@ class addset(abstractsmartset): def _iterator(self): """Iterate over both collections without repeating elements - If the ascending attribute is not set, iterate over the first one and - then over the second one checking for membership on the first one so we - dont yield any duplicates. - - If the ascending attribute is set, iterate over both collections at the - same time, yielding only one value at a time in the given order. + No matter if the ascending attribute is set, iterate over the first one + and then over the second one checking for membership on the first one + so we don't yield any duplicates. """ - if self._ascending is None: - def gen(): - for r in self._r1: - yield r - inr1 = self._r1.__contains__ - for r in self._r2: - if not inr1(r): - yield r - gen = gen() - else: - iter1 = iter(self._r1) - iter2 = iter(self._r2) - gen = self._iterordered(self._ascending, iter1, iter2) - return gen + for r in self._r1: + yield r + inr1 = self._r1.__contains__ + for r in self._r2: + if not inr1(r): + yield r def __iter__(self): if self._ascending is None: