Patchwork [stable] revset: don't cache abstractsmartset min/max invocations infinitely

login
register
mail settings
Submitter Mads Kiilerich
Date Oct. 25, 2016, 4:57 p.m.
Message ID <c2fe58cd4235fc6c8cab.1477414646@madski>
Download mbox | patch
Permalink /patch/17194/
State Accepted
Headers show

Comments

Mads Kiilerich - Oct. 25, 2016, 4:57 p.m.
# HG changeset patch
# User Mads Kiilerich <madski@unity3d.com>
# Date 1477414587 -7200
#      Tue Oct 25 18:56:27 2016 +0200
# Branch stable
# Node ID c2fe58cd4235fc6c8cabea882794303d620bec3a
# Parent  76c57e1fe79b0980b377b4f305635dea393d6315
revset: don't cache abstractsmartset min/max invocations infinitely

There was a "leak", apparently introduced in ab66c1dee405. When running:

    hg = hglib.open('repo')
    while True:
        hg.log("max(branch('default'))")

all filteredset instances from branch() would be cached indefinitely by the
@util.cachefunc annotation on the max() implementation.

util.cachefunc seems dangerous as method decorator and is barely used elsewhere
in the code base. Instead, just open code caching by having the min/max
methods replace themselves with a plain lambda returning the result.
Yuya Nishihara - Oct. 26, 2016, 11:21 a.m.
On Tue, 25 Oct 2016 18:57:26 +0200, Mads Kiilerich wrote:
> # HG changeset patch
> # User Mads Kiilerich <madski@unity3d.com>
> # Date 1477414587 -7200
> #      Tue Oct 25 18:56:27 2016 +0200
> # Branch stable
> # Node ID c2fe58cd4235fc6c8cabea882794303d620bec3a
> # Parent  76c57e1fe79b0980b377b4f305635dea393d6315
> revset: don't cache abstractsmartset min/max invocations infinitely
> 
> There was a "leak", apparently introduced in ab66c1dee405. When running:
> 
>     hg = hglib.open('repo')
>     while True:
>         hg.log("max(branch('default'))")
> 
> all filteredset instances from branch() would be cached indefinitely by the
> @util.cachefunc annotation on the max() implementation.

Indeed. We've cached {self: v} pair every time max() was called.
Queued this for stable, thanks.

> -    @util.cachefunc
>      def min(self):
>          """return the minimum element in the set"""
> -        if self.fastasc is not None:
> -            for r in self.fastasc():
> -                return r
> -            raise ValueError('arg is an empty sequence')
> -        return min(self)
> -
> -    @util.cachefunc
> +        if self.fastasc is None:
> +            v = min(self)
> +        else:
> +            for v in self.fastasc():
> +                break
> +            else:
> +                raise ValueError('arg is an empty sequence')
> +        self.min = lambda: v
> +        return v

I slightly prefer using propertycache() and wrap it by a function, which
is a common pattern seen in context.py, but that's just a nitpicking.
Mads Kiilerich - Oct. 26, 2016, 12:09 p.m.
On 10/26/2016 01:21 PM, Yuya Nishihara wrote:
> On Tue, 25 Oct 2016 18:57:26 +0200, Mads Kiilerich wrote:
>> # HG changeset patch
>> # User Mads Kiilerich <madski@unity3d.com>
>> # Date 1477414587 -7200
>> #      Tue Oct 25 18:56:27 2016 +0200
>> # Branch stable
>> # Node ID c2fe58cd4235fc6c8cabea882794303d620bec3a
>> # Parent  76c57e1fe79b0980b377b4f305635dea393d6315
>> revset: don't cache abstractsmartset min/max invocations infinitely
>>
>> There was a "leak", apparently introduced in ab66c1dee405. When running:
>>
>>      hg = hglib.open('repo')
>>      while True:
>>          hg.log("max(branch('default'))")
>>
>> all filteredset instances from branch() would be cached indefinitely by the
>> @util.cachefunc annotation on the max() implementation.
> Indeed. We've cached {self: v} pair every time max() was called.
> Queued this for stable, thanks.
>
>> -    @util.cachefunc
>>       def min(self):
>>           """return the minimum element in the set"""
>> -        if self.fastasc is not None:
>> -            for r in self.fastasc():
>> -                return r
>> -            raise ValueError('arg is an empty sequence')
>> -        return min(self)
>> -
>> -    @util.cachefunc
>> +        if self.fastasc is None:
>> +            v = min(self)
>> +        else:
>> +            for v in self.fastasc():
>> +                break
>> +            else:
>> +                raise ValueError('arg is an empty sequence')
>> +        self.min = lambda: v
>> +        return v
> I slightly prefer using propertycache() and wrap it by a function, which
> is a common pattern seen in context.py, but that's just a nitpicking.

Exactly what do you mean with wrapping the property by a function? I 
don't see a clear pattern of that in context.py?

Instead, I would perhaps prefer to introduce a 'gettercache' that works 
on methods that only have self as parameter (and thus can set a simple 
instance method as I do here)  ... or a 'methodcache' that would be like 
cachefunc but store the cache on the first 'self' parameter.

/Mads
Yuya Nishihara - Oct. 26, 2016, 12:31 p.m.
On Wed, 26 Oct 2016 14:09:17 +0200, Mads Kiilerich wrote:
> On 10/26/2016 01:21 PM, Yuya Nishihara wrote:
> > On Tue, 25 Oct 2016 18:57:26 +0200, Mads Kiilerich wrote:
> >> # HG changeset patch
> >> # User Mads Kiilerich <madski@unity3d.com>
> >> # Date 1477414587 -7200
> >> #      Tue Oct 25 18:56:27 2016 +0200
> >> # Branch stable
> >> # Node ID c2fe58cd4235fc6c8cabea882794303d620bec3a
> >> # Parent  76c57e1fe79b0980b377b4f305635dea393d6315
> >> revset: don't cache abstractsmartset min/max invocations infinitely
> >>
> >> There was a "leak", apparently introduced in ab66c1dee405. When running:
> >>
> >>      hg = hglib.open('repo')
> >>      while True:
> >>          hg.log("max(branch('default'))")
> >>
> >> all filteredset instances from branch() would be cached indefinitely by the
> >> @util.cachefunc annotation on the max() implementation.
> > Indeed. We've cached {self: v} pair every time max() was called.
> > Queued this for stable, thanks.
> >
> >> -    @util.cachefunc
> >>       def min(self):
> >>           """return the minimum element in the set"""
> >> -        if self.fastasc is not None:
> >> -            for r in self.fastasc():
> >> -                return r
> >> -            raise ValueError('arg is an empty sequence')
> >> -        return min(self)
> >> -
> >> -    @util.cachefunc
> >> +        if self.fastasc is None:
> >> +            v = min(self)
> >> +        else:
> >> +            for v in self.fastasc():
> >> +                break
> >> +            else:
> >> +                raise ValueError('arg is an empty sequence')
> >> +        self.min = lambda: v
> >> +        return v
> > I slightly prefer using propertycache() and wrap it by a function, which
> > is a common pattern seen in context.py, but that's just a nitpicking.
> 
> Exactly what do you mean with wrapping the property by a function? I 
> don't see a clear pattern of that in context.py?

Yes. What I have in mind is parents() -> self._parents for instance.

> Instead, I would perhaps prefer to introduce a 'gettercache' that works 
> on methods that only have self as parameter (and thus can set a simple 
> instance method as I do here)  ... or a 'methodcache' that would be like 
> cachefunc but store the cache on the first 'self' parameter.

That also seems fine. (I'm not a big fan of overwriting methods since we're
likely to create self->self cycle, but that wouldn't be the case here.)
Mads Kiilerich - Oct. 26, 2016, 1:13 p.m.
On 10/26/2016 02:31 PM, Yuya Nishihara wrote:
> On Wed, 26 Oct 2016 14:09:17 +0200, Mads Kiilerich wrote:
>> Instead, I would perhaps prefer to introduce a 'gettercache' that works
>> on methods that only have self as parameter (and thus can set a simple
>> instance method as I do here)  ... or a 'methodcache' that would be like
>> cachefunc but store the cache on the first 'self' parameter.
> That also seems fine. (I'm not a big fan of overwriting methods since we're
> likely to create self->self cycle, but that wouldn't be the case here.)


I guess it could be slightly more gc efficient to use 'lambda v=v: v' to 
avoid having a reference to v in the outer scope that also has self ...

/Mads

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -2924,23 +2924,29 @@  class abstractsmartset(object):
         """True if the set will iterate in topographical order"""
         raise NotImplementedError()
 
-    @util.cachefunc
     def min(self):
         """return the minimum element in the set"""
-        if self.fastasc is not None:
-            for r in self.fastasc():
-                return r
-            raise ValueError('arg is an empty sequence')
-        return min(self)
-
-    @util.cachefunc
+        if self.fastasc is None:
+            v = min(self)
+        else:
+            for v in self.fastasc():
+                break
+            else:
+                raise ValueError('arg is an empty sequence')
+        self.min = lambda: v
+        return v
+
     def max(self):
         """return the maximum element in the set"""
-        if self.fastdesc is not None:
-            for r in self.fastdesc():
-                return r
-            raise ValueError('arg is an empty sequence')
-        return max(self)
+        if self.fastdesc is None:
+            return max(self)
+        else:
+            for v in self.fastdesc():
+                break
+            else:
+                raise ValueError('arg is an empty sequence')
+        self.max = lambda: v
+        return v
 
     def first(self):
         """return the first element in the set (user iteration perspective)