Patchwork [2,of,5] revset: make internal _list() expression remove duplicated revisions

login
register
mail settings
Submitter Yuya Nishihara
Date May 29, 2015, 2:38 p.m.
Message ID <5be635444c0156a6d068.1432910285@mimosa>
Download mbox | patch
Permalink /patch/9357/
State Accepted
Headers show

Comments

Yuya Nishihara - May 29, 2015, 2:38 p.m.
# HG changeset patch
# User Yuya Nishihara <yuya@tcha.org>
# Date 1432446581 -32400
#      Sun May 24 14:49:41 2015 +0900
# Node ID 5be635444c0156a6d0688d5ba3688a9a9db7f561
# Parent  5617fab1b1f1905663bd38053144a93a746f34db
revset: make internal _list() expression remove duplicated revisions

This allows us to optimize chained 'or' operations to _list() expression.

Unlike _intlist() or _hexlist(), it's difficult to remove duplicates by the
caller of _list() because different symbols can point to the same revision.
If the caller knows all symbols are unique, that probably means revisions or
nodes are known, therefore, _intlist() or _hexlist() should be used instead.
So, it makes sense to check duplicates by _list() function.

'%ls' is no longer used in core, this won't cause performance regression.
Pierre-Yves David - May 29, 2015, 7:43 p.m.
On 05/29/2015 07:38 AM, Yuya Nishihara wrote:
> # HG changeset patch
> # User Yuya Nishihara <yuya@tcha.org>
> # Date 1432446581 -32400
> #      Sun May 24 14:49:41 2015 +0900
> # Node ID 5be635444c0156a6d0688d5ba3688a9a9db7f561
> # Parent  5617fab1b1f1905663bd38053144a93a746f34db
> revset: make internal _list() expression remove duplicated revisions
>
> This allows us to optimize chained 'or' operations to _list() expression.
>
> Unlike _intlist() or _hexlist(), it's difficult to remove duplicates by the
> caller of _list() because different symbols can point to the same revision.
> If the caller knows all symbols are unique, that probably means revisions or
> nodes are known, therefore, _intlist() or _hexlist() should be used instead.
> So, it makes sense to check duplicates by _list() function.
>
> '%ls' is no longer used in core, this won't cause performance regression.

What about we declare the %lX are to be considered "ascending" and we 
throw the whole thing into a set before handing out the result to 
baseset() ?

This is an API change, but we do not have API, do we?

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -1920,9 +1920,18 @@  def _list(repo, subset, x):
     s = getstring(x, "internal error")
     if not s:
         return baseset()
-    ls = [repo[r].rev() for r in s.split('\0')]
-    s = subset
-    return baseset([r for r in ls if r in s])
+    # remove duplicates here. it's difficult for caller to deduplicate sets
+    # because different symbols can point to the same rev.
+    ls = []
+    seen = set()
+    for t in s.split('\0'):
+        r = repo[t].rev()
+        if r in seen:
+            continue
+        if r in subset:
+            ls.append(r)
+        seen.add(r)
+    return baseset(ls)
 
 # for internal use
 def _intlist(repo, subset, x):