Patchwork revset: added negative cache to lazysets

login
register
mail settings
Submitter Lucas Moscovicz
Date Feb. 13, 2014, 2:34 a.m.
Message ID <7d11b638a9cf4e99213e.1392258849@dev1037.prn2.facebook.com>
Download mbox | patch
Permalink /patch/3634/
State Deferred
Headers show

Comments

Lucas Moscovicz - Feb. 13, 2014, 2:34 a.m.
# HG changeset patch
# User Lucas Moscovicz <lmoscovicz@fb.com>
# Date 1391556717 28800
#      Tue Feb 04 15:31:57 2014 -0800
# Node ID 7d11b638a9cf4e99213e73a704a497da5a1e2d2b
# Parent  cf2a8a38d2665eff5787de8c12d4d23031493ba6
revset: added negative cache to lazysets

The idea of this is to be able to answer faster when something is not
contained in the lazyset instead of having to calculate it again.
Matt Mackall - Feb. 13, 2014, 3:23 p.m.
On Wed, 2014-02-12 at 18:34 -0800, Lucas Moscovicz wrote:
> # HG changeset patch
> # User Lucas Moscovicz <lmoscovicz@fb.com>
> # Date 1391556717 28800
> #      Tue Feb 04 15:31:57 2014 -0800
> # Node ID 7d11b638a9cf4e99213e73a704a497da5a1e2d2b
> # Parent  cf2a8a38d2665eff5787de8c12d4d23031493ba6
> revset: added negative cache to lazysets
> 
> The idea of this is to be able to answer faster when something is not
> contained in the lazyset instead of having to calculate it again.
> 
> diff --git a/mercurial/revset.py b/mercurial/revset.py
> --- a/mercurial/revset.py
> +++ b/mercurial/revset.py
> @@ -2100,9 +2100,15 @@
>      def __init__(self, subset, condition):
>          self._subset = subset
>          self._condition = condition
> +        self._ncache = set()
>  
>      def __contains__(self, x):
> -        return x in self._subset and self._condition(x)

Let's do an approximate operation count on this:

> +        if x in self._ncache:

if, "in", name lookup in self

> +            return False
> +        c =  x in self._subset and self._condition(x)

in, and, name lookup, function call, assign

> +        if not c:

if

> +            self._ncache.add(x)

name lookup x 2, set add

> +        return c

return.

In the fast path, we do:

 if, in, lookup, return

In the slow path, we do:

 if, in, lookup, in, lookup, func, assign, if, and, lookup, lookup,
 set add, return

Reminder: these foo.bar.baz name lookup operations are really expensive!

Now consider this:

    self._cache = {}
...
    c = self._cache
    if x not in c:
        c[x] = x in self._subset and self._condition(x)
    return c[x]

This is the pattern we use for caching all over Mercurial. Fast path:

 lookup, assign, if, in, return

Slow path:

 lookup, assign, if, in, lookup, lookup, func, dict add, and, return

This has fewer total ops, fewer name lookups.. and it caches both
positive and negative results!

Depending on what we're caching, we can drop the initial "c = " to get a
marginally faster fast path in exchange for a slower slow path. If the
fast path is more common than the slow path, this will be a win. But for
caches where we expect the slow path to be more common, we want to use
the above.

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -2100,9 +2100,15 @@ 
     def __init__(self, subset, condition):
         self._subset = subset
         self._condition = condition
+        self._ncache = set()
 
     def __contains__(self, x):
-        return x in self._subset and self._condition(x)
+        if x in self._ncache:
+            return False
+        c =  x in self._subset and self._condition(x)
+        if not c:
+            self._ncache.add(x)
+        return c
 
     def __iter__(self):
         cond = self._condition