Submitter | Pierre-Yves David |
---|---|
Date | May 14, 2015, 5:38 a.m. |
Message ID | <555434D8.2010402@ens-lyon.org> |
Download | mbox | patch |
Permalink | /patch/9067/ |
State | Not Applicable |
Headers | show |
Comments
On Wed, 13 May 2015 22:38:32 -0700, Pierre-Yves David wrote: > On 05/13/2015 07:39 AM, Yuya Nishihara wrote: > > On Tue, 12 May 2015 17:52:30 -0700, Pierre-Yves David wrote: > >> On 05/12/2015 03:59 PM, Yuya Nishihara wrote: > >>> # HG changeset patch > >>> # User Yuya Nishihara <yuya@tcha.org> > >>> # Date 1427716479 -32400 > >>> # Mon Mar 30 20:54:39 2015 +0900 > >>> # Node ID cc7253d43150ef937aca205f2cb66a569378aeea > >>> # Parent e487cb80cc333a4d92376daa29c08fa703c5b249 > >>> revset: do not try to sort revisions unnecessarily by addset._iterator() > >>> > >>> This fixes addset to remove duplicated revisions from right-hand-side set > >>> even if one of the subset does not support fastasc/fastdesc. > >>> > >>> addset._iterator() did not do the right thing if _ascending is set, because > >>> addset._iterordered() expects that the given sets are sorted. This patch > >>> removes the broken implementation. There's no need for addset._iterator() > >>> to sort revisions because the result will be sorted by _trysetasclist() > >>> as necessary. > >> > >> I agree the current behavior is buggy. But I'm dubious about your the > >> way to solve it. > >> > >> The special casing of 'self._ascending' is a good correctness and > >> optimisation purpose. > >> > >> I see two potential issues here: > >> > >> 1) we use iterator from self._r1 and self._r2 without ensuring they are > >> sorted in the order we need them to be sorted (or using appropriate > >> directed iterator). > >> > >> 2) _iterordered is apparently not doing deduplication. > >> > >> Fixing these two should solve the bug without performance and > >> correctness impact? > >> > >> As far as I understand your patch, sorting the addset would result into > >> a sorted output most case. > > > > _iterordered() can remove duplicates if both r1 and r2 are sorted. So we have > > to fix (1). > > This small patch seems to fix the duplication issue and fix the problem. > > diff --git a/mercurial/revset.py b/mercurial/revset.py > --- a/mercurial/revset.py > +++ b/mercurial/revset.py > @@ -3044,4 +3044,6 @@ class addset(abstractsmartset): > gen = gen() > else: > + self._r1.sort(not self._ascending) > + self._r2.sort(not self._ascending) > iter1 = iter(self._r1) > iter2 = iter(self._r2) Yes. I was afraid of mutating r1 and r2 here because input sets might be shared. > > But, when _iterator() is called with _ascending is not None (*1), we know at > > least one of r1 or r2 does not support fastasc/fastdesc. So we'll have to > > consume r1 or r2 anyway to get it sorted. I think the use of _iterordered() had > > little benefit in this case. > > > > (*1) except when called from __len__(), which can be assumed to be slow > > Is _ascending is not None, the iteration is requested to be done in a > sorted order. I would assume than using _iterator here could give us > data in the wrong order, but your test seems to disagree. I'll give a > deeper look at why the code still behave right tomorrow. That is because the result of _iterator() is sorted by _trysetasclist(). __iter__() _list # consume the gen and try again _genlist = _iterator() __iter__() # reenter _trysetasclist() _asclist = sorted(_genlist) fastasc() or fastdesc() return _asclist.__iter__ or _asclist.__reversed__ Regards,
On 05/14/2015 07:27 AM, Yuya Nishihara wrote: > On Wed, 13 May 2015 22:38:32 -0700, Pierre-Yves David wrote: >> On 05/13/2015 07:39 AM, Yuya Nishihara wrote: >>> On Tue, 12 May 2015 17:52:30 -0700, Pierre-Yves David wrote: >>>> On 05/12/2015 03:59 PM, Yuya Nishihara wrote: >>>>> # HG changeset patch >>>>> # User Yuya Nishihara <yuya@tcha.org> >>>>> # Date 1427716479 -32400 >>>>> # Mon Mar 30 20:54:39 2015 +0900 >>>>> # Node ID cc7253d43150ef937aca205f2cb66a569378aeea >>>>> # Parent e487cb80cc333a4d92376daa29c08fa703c5b249 >>>>> revset: do not try to sort revisions unnecessarily by addset._iterator() >>>>> >>>>> This fixes addset to remove duplicated revisions from right-hand-side set >>>>> even if one of the subset does not support fastasc/fastdesc. >>>>> >>>>> addset._iterator() did not do the right thing if _ascending is set, because >>>>> addset._iterordered() expects that the given sets are sorted. This patch >>>>> removes the broken implementation. There's no need for addset._iterator() >>>>> to sort revisions because the result will be sorted by _trysetasclist() >>>>> as necessary. >>>> >>>> I agree the current behavior is buggy. But I'm dubious about your the >>>> way to solve it. >>>> >>>> The special casing of 'self._ascending' is a good correctness and >>>> optimisation purpose. >>>> >>>> I see two potential issues here: >>>> >>>> 1) we use iterator from self._r1 and self._r2 without ensuring they are >>>> sorted in the order we need them to be sorted (or using appropriate >>>> directed iterator). >>>> >>>> 2) _iterordered is apparently not doing deduplication. >>>> >>>> Fixing these two should solve the bug without performance and >>>> correctness impact? >>>> >>>> As far as I understand your patch, sorting the addset would result into >>>> a sorted output most case. >>> >>> _iterordered() can remove duplicates if both r1 and r2 are sorted. So we have >>> to fix (1). >> >> This small patch seems to fix the duplication issue and fix the problem. >> >> diff --git a/mercurial/revset.py b/mercurial/revset.py >> --- a/mercurial/revset.py >> +++ b/mercurial/revset.py >> @@ -3044,4 +3044,6 @@ class addset(abstractsmartset): >> gen = gen() >> else: >> + self._r1.sort(not self._ascending) >> + self._r2.sort(not self._ascending) >> iter1 = iter(self._r1) >> iter2 = iter(self._r2) > > Yes. I was afraid of mutating r1 and r2 here because input sets might be > shared. I cannot remember if mutating you sub-smartset is a sin or not. I've patchbombed a version that stay quite conservative on this approach >>> But, when _iterator() is called with _ascending is not None (*1), we know at >>> least one of r1 or r2 does not support fastasc/fastdesc. So we'll have to >>> consume r1 or r2 anyway to get it sorted. I think the use of _iterordered() had >>> little benefit in this case. >>> >>> (*1) except when called from __len__(), which can be assumed to be slow >> >> Is _ascending is not None, the iteration is requested to be done in a >> sorted order. I would assume than using _iterator here could give us >> data in the wrong order, but your test seems to disagree. I'll give a >> deeper look at why the code still behave right tomorrow. > > That is because the result of _iterator() is sorted by _trysetasclist(). > > __iter__() > _list # consume the gen and try again > _genlist = _iterator() > __iter__() # reenter > _trysetasclist() > _asclist = sorted(_genlist) > fastasc() or fastdesc() > return _asclist.__iter__ or _asclist.__reversed__ This _iterator() thing is actually a big silly. I think we could be lazyer (cf my pre-cited patch)
Patch
diff --git a/mercurial/revset.py b/mercurial/revset.py --- a/mercurial/revset.py +++ b/mercurial/revset.py @@ -3044,4 +3044,6 @@ class addset(abstractsmartset): gen = gen() else: + self._r1.sort(not self._ascending) + self._r2.sort(not self._ascending) iter1 = iter(self._r1) iter2 = iter(self._r2)