Submitter | Yuya Nishihara |
---|---|
Date | May 29, 2015, 2:38 p.m. |
Message ID | <5be635444c0156a6d068.1432910285@mimosa> |
Download | mbox | patch |
Permalink | /patch/9357/ |
State | Accepted |
Headers | show |
Comments
On 05/29/2015 07:38 AM, Yuya Nishihara wrote: > # HG changeset patch > # User Yuya Nishihara <yuya@tcha.org> > # Date 1432446581 -32400 > # Sun May 24 14:49:41 2015 +0900 > # Node ID 5be635444c0156a6d0688d5ba3688a9a9db7f561 > # Parent 5617fab1b1f1905663bd38053144a93a746f34db > revset: make internal _list() expression remove duplicated revisions > > This allows us to optimize chained 'or' operations to _list() expression. > > Unlike _intlist() or _hexlist(), it's difficult to remove duplicates by the > caller of _list() because different symbols can point to the same revision. > If the caller knows all symbols are unique, that probably means revisions or > nodes are known, therefore, _intlist() or _hexlist() should be used instead. > So, it makes sense to check duplicates by _list() function. > > '%ls' is no longer used in core, this won't cause performance regression. What about we declare the %lX are to be considered "ascending" and we throw the whole thing into a set before handing out the result to baseset() ? This is an API change, but we do not have API, do we?
Patch
diff --git a/mercurial/revset.py b/mercurial/revset.py --- a/mercurial/revset.py +++ b/mercurial/revset.py @@ -1920,9 +1920,18 @@ def _list(repo, subset, x): s = getstring(x, "internal error") if not s: return baseset() - ls = [repo[r].rev() for r in s.split('\0')] - s = subset - return baseset([r for r in ls if r in s]) + # remove duplicates here. it's difficult for caller to deduplicate sets + # because different symbols can point to the same rev. + ls = [] + seen = set() + for t in s.split('\0'): + r = repo[t].rev() + if r in seen: + continue + if r in subset: + ls.append(r) + seen.add(r) + return baseset(ls) # for internal use def _intlist(repo, subset, x):