Patchwork [1,of,2] smartset: convert set to list lazily

login
register
mail settings
Submitter Martin von Zweigbergk
Date Feb. 19, 2017, 1:03 a.m.
Message ID <CAESOdVB=PHO1uPihDu2px+gsTpLgBn0vnHuWOUAFaKZa763t-w@mail.gmail.com>
Download mbox | patch
Permalink /patch/18647/
State Not Applicable
Headers show

Comments

Martin von Zweigbergk - Feb. 19, 2017, 1:03 a.m.
On Feb 17, 2017 10:02 PM, "Jun Wu" <quark@fb.com> wrote:

# HG changeset patch
# User Jun Wu <quark@fb.com>
# Date 1487393969 28800
#      Fri Feb 17 20:59:29 2017 -0800
# Node ID b32af6cafb9b958decb7ae0db7dd754aa5a5f883
# Parent  01eebb65a61d9edcad1665ed747c7092f1ddb8b9
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r
b32af6cafb9b
smartset: convert set to list lazily

If the caller only wants to construct a baseset via a set, and then do
"__contains__" tests. It's unnecessary to initialize the list.

Testing on my unfiltered hg-committed repo where len(draft()) is 2600, this
patch shows about 6% improvement on set intensive queries:

Before:

  $ for i in `seq 5`; hg perfrevset 'draft() & draft() & draft() & draft()'
  ! wall 0.001196 comb 0.000000 user 0.000000 sys 0.000000 (best of 2011)
  ! wall 0.001191 comb 0.000000 user 0.000000 sys 0.000000 (best of 2099)
  ! wall 0.001186 comb 0.010000 user 0.010000 sys 0.000000 (best of 1953)
  ! wall 0.001182 comb 0.000000 user 0.000000 sys 0.000000 (best of 2135)
  ! wall 0.001193 comb 0.000000 user 0.000000 sys 0.000000 (best of 2177)

After:

  $ for i in `seq 5`; hg perfrevset 'draft() & draft() & draft() & draft()'
  ! wall 0.001128 comb 0.000000 user 0.000000 sys 0.000000 (best of 2247)
  ! wall 0.001119 comb 0.000000 user 0.000000 sys 0.000000 (best of 2317)
  ! wall 0.001115 comb 0.000000 user 0.000000 sys 0.000000 (best of 2244)
  ! wall 0.001131 comb 0.000000 user 0.000000 sys 0.000000 (best of 2093)
  ! wall 0.001124 comb 0.000000 user 0.000000 sys 0.000000 (best of 2134)

It could have bigger impact on larger sets in theory.

Could delete method, I think. Just FYI.


     def sort(self, reverse=False):
@@ -219,5 +229,8 @@ class baseset(abstractsmartset):

     def __len__(self):
-        return len(self._list)
+        if '_list' in self.__dict__:
+            return len(self._list)
+        else:
+            return len(self._set)

     def isascending(self):
Jun Wu - Feb. 19, 2017, 1:43 a.m.
Excerpts from Martin von Zweigbergk's message of 2017-02-18 17:03:12 -0800:
> @@ -205,5 +215,5 @@ class baseset(abstractsmartset):
> 
>      def __nonzero__(self):
> -        return bool(self._list)
> +        return bool(len(self))
> 
> Could delete method, I think. Just FYI.

No, because abstractsmartset implements __nonzero__ to
"raise NotImplementedError()".

I think that's reasonable as it will force people to not forget overriding
the __nonzero__ implementation, to make things more explicit.

Patch

diff --git a/mercurial/smartset.py b/mercurial/smartset.py
--- a/mercurial/smartset.py
+++ b/mercurial/smartset.py
@@ -172,6 +172,10 @@  class baseset(abstractsmartset):
                 # set has no order we pick one for stability purpose
                 self._ascending = True
-            data = list(data)
-        self._list = data
+                # converting set to list has a cost, do it lazily
+                data = None
+            else:
+                data = list(data)
+        if data is not None:
+            self._list = data
         self._datarepr = datarepr

@@ -186,4 +190,10 @@  class baseset(abstractsmartset):
         return asclist

+    @util.propertycache
+    def _list(self):
+        # _list is only lazily constructed if we have _set
+        assert '_set' in self.__dict__
+        return list(self._set)
+
     def __iter__(self):
         if self._ascending is None:
@@ -205,5 +215,5 @@  class baseset(abstractsmartset):

     def __nonzero__(self):
-        return bool(self._list)
+        return bool(len(self))