Patchwork [3,of,4] revset: use subset itself if it is instance of specific classes

login
register
mail settings
Submitter Katsunori FUJIWARA
Date March 29, 2013, 4:16 p.m.
Message ID <8af24dd8f88c5f8011ae.1364573793@feefifofum>
Download mbox | patch
Permalink /patch/1219/
State Rejected
Headers show

Comments

Katsunori FUJIWARA - March 29, 2013, 4:16 p.m.
# HG changeset patch
# User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
# Date 1364573132 -32400
# Node ID 8af24dd8f88c5f8011ae32bb7256977c94eab238
# Parent  bede15731ca703fdf8ca8815793029766c3fc893
revset: use subset itself if it is instance of specific classes

Before this patch, "set(subset)" is always invoked, even though
"__contains__" of "subset" is enough cheap. This causes scanning all
items in "subset", and may have large performance impact.

This patch uses subset itself if it is instance of "set" or
"_safesubset", because "__contains__" of such instances is cheap
enough.

This patch affects predicates below:

  - rangeset(":")
  - dagrange("::")
  - limit
  - last
  - _list(used internaly via "repo.revs()" or "repo.set()")

Results of "hg perfrevset" for each revspec (before/after this patch)
on the repository containing 40000 revisions are shown below:

  - "tip:tip":
    ! wall 0.000000 comb 0.000000 user 0.000000 sys 0.000000 (best of 1381)
    ! wall 0.000000 comb 0.000000 user 0.000000 sys 0.000000 (best of 23399)

  - "tip::tip":
    ! wall 0.000000 comb 0.000000 user 0.000000 sys 0.000000 (best of 1344)
    ! wall 0.000000 comb 0.000000 user 0.000000 sys 0.000000 (best of 17864)

  - "limit(tip)":
    ! wall 0.000000 comb 0.000000 user 0.000000 sys 0.000000 (best of 1325)
    ! wall 0.000000 comb 0.000000 user 0.000000 sys 0.000000 (best of 27536)

  - "last(tip)":
    ! wall 0.000000 comb 0.000000 user 0.000000 sys 0.000000 (best of 1378)
    ! wall 0.000000 comb 0.000000 user 0.000000 sys 0.000000 (best of 27684)

Patch

diff -r bede15731ca7 -r 8af24dd8f88c contrib/check-code.py
--- a/contrib/check-code.py	Sat Mar 30 01:05:32 2013 +0900
+++ b/contrib/check-code.py	Sat Mar 30 01:05:32 2013 +0900
@@ -283,6 +283,8 @@ 
   [
     (r'\blist\(repo\)',
      "use _safesubset(repo) as subset instead of list(repo) in revset"),
+    (r'\bset\(subset\)',
+     "use _ensureset(subset) instead of set(suset) in revset"),
   ],
   # warnings
   []
diff -r bede15731ca7 -r 8af24dd8f88c mercurial/revset.py
--- a/mercurial/revset.py	Sat Mar 30 01:05:32 2013 +0900
+++ b/mercurial/revset.py	Sat Mar 30 01:05:32 2013 +0900
@@ -218,6 +218,11 @@ 
     def __contains__(self, key):
         return key in self._o
 
+def _ensureset(subset):
+    if not isinstance(subset, (set, _safesubset)):
+        return set(subset) # check-code-ignore
+    return subset
+
 # operator methods
 
 def stringset(repo, subset, x):
@@ -246,14 +251,14 @@ 
         r = range(m, n + 1)
     else:
         r = range(m, n - 1, -1)
-    s = set(subset)
+    s = _ensureset(subset)
     return [x for x in r if x in s]
 
 def dagrange(repo, subset, x, y):
     if subset:
         r = _safesubset(repo)
         xs = _revsbetween(repo, getset(repo, r, x), getset(repo, r, y))
-        s = set(subset)
+        s = _ensureset(subset)
         return [r for r in xs if r in s]
     return []
 
@@ -943,7 +948,7 @@ 
     except (TypeError, ValueError):
         # i18n: "limit" is a keyword
         raise error.ParseError(_("limit expects a number"))
-    ss = set(subset)
+    ss = _ensureset(subset)
     os = getset(repo, _safesubset(repo), l[0])[:lim]
     return [r for r in os if r in ss]
 
@@ -961,7 +966,7 @@ 
     except (TypeError, ValueError):
         # i18n: "last" is a keyword
         raise error.ParseError(_("last expects a number"))
-    ss = set(subset)
+    ss = _ensureset(subset)
     os = getset(repo, _safesubset(repo), l[0])[-lim:]
     return [r for r in os if r in ss]
 
@@ -1542,8 +1547,7 @@ 
     s = getstring(x, "internal error")
     if not s:
         return []
-    if not isinstance(subset, set):
-        subset = set(subset)
+    subset = _ensureset(subset)
     ls = [repo[r].rev() for r in s.split('\0')]
     return [r for r in ls if r in subset]