Submitter | Mads Kiilerich |
---|---|
Date | Oct. 25, 2016, 4:57 p.m. |
Message ID | <c2fe58cd4235fc6c8cab.1477414646@madski> |
Download | mbox | patch |
Permalink | /patch/17194/ |
State | Accepted |
Headers | show |
Comments
On Tue, 25 Oct 2016 18:57:26 +0200, Mads Kiilerich wrote: > # HG changeset patch > # User Mads Kiilerich <madski@unity3d.com> > # Date 1477414587 -7200 > # Tue Oct 25 18:56:27 2016 +0200 > # Branch stable > # Node ID c2fe58cd4235fc6c8cabea882794303d620bec3a > # Parent 76c57e1fe79b0980b377b4f305635dea393d6315 > revset: don't cache abstractsmartset min/max invocations infinitely > > There was a "leak", apparently introduced in ab66c1dee405. When running: > > hg = hglib.open('repo') > while True: > hg.log("max(branch('default'))") > > all filteredset instances from branch() would be cached indefinitely by the > @util.cachefunc annotation on the max() implementation. Indeed. We've cached {self: v} pair every time max() was called. Queued this for stable, thanks. > - @util.cachefunc > def min(self): > """return the minimum element in the set""" > - if self.fastasc is not None: > - for r in self.fastasc(): > - return r > - raise ValueError('arg is an empty sequence') > - return min(self) > - > - @util.cachefunc > + if self.fastasc is None: > + v = min(self) > + else: > + for v in self.fastasc(): > + break > + else: > + raise ValueError('arg is an empty sequence') > + self.min = lambda: v > + return v I slightly prefer using propertycache() and wrap it by a function, which is a common pattern seen in context.py, but that's just a nitpicking.
On 10/26/2016 01:21 PM, Yuya Nishihara wrote: > On Tue, 25 Oct 2016 18:57:26 +0200, Mads Kiilerich wrote: >> # HG changeset patch >> # User Mads Kiilerich <madski@unity3d.com> >> # Date 1477414587 -7200 >> # Tue Oct 25 18:56:27 2016 +0200 >> # Branch stable >> # Node ID c2fe58cd4235fc6c8cabea882794303d620bec3a >> # Parent 76c57e1fe79b0980b377b4f305635dea393d6315 >> revset: don't cache abstractsmartset min/max invocations infinitely >> >> There was a "leak", apparently introduced in ab66c1dee405. When running: >> >> hg = hglib.open('repo') >> while True: >> hg.log("max(branch('default'))") >> >> all filteredset instances from branch() would be cached indefinitely by the >> @util.cachefunc annotation on the max() implementation. > Indeed. We've cached {self: v} pair every time max() was called. > Queued this for stable, thanks. > >> - @util.cachefunc >> def min(self): >> """return the minimum element in the set""" >> - if self.fastasc is not None: >> - for r in self.fastasc(): >> - return r >> - raise ValueError('arg is an empty sequence') >> - return min(self) >> - >> - @util.cachefunc >> + if self.fastasc is None: >> + v = min(self) >> + else: >> + for v in self.fastasc(): >> + break >> + else: >> + raise ValueError('arg is an empty sequence') >> + self.min = lambda: v >> + return v > I slightly prefer using propertycache() and wrap it by a function, which > is a common pattern seen in context.py, but that's just a nitpicking. Exactly what do you mean with wrapping the property by a function? I don't see a clear pattern of that in context.py? Instead, I would perhaps prefer to introduce a 'gettercache' that works on methods that only have self as parameter (and thus can set a simple instance method as I do here) ... or a 'methodcache' that would be like cachefunc but store the cache on the first 'self' parameter. /Mads
On Wed, 26 Oct 2016 14:09:17 +0200, Mads Kiilerich wrote: > On 10/26/2016 01:21 PM, Yuya Nishihara wrote: > > On Tue, 25 Oct 2016 18:57:26 +0200, Mads Kiilerich wrote: > >> # HG changeset patch > >> # User Mads Kiilerich <madski@unity3d.com> > >> # Date 1477414587 -7200 > >> # Tue Oct 25 18:56:27 2016 +0200 > >> # Branch stable > >> # Node ID c2fe58cd4235fc6c8cabea882794303d620bec3a > >> # Parent 76c57e1fe79b0980b377b4f305635dea393d6315 > >> revset: don't cache abstractsmartset min/max invocations infinitely > >> > >> There was a "leak", apparently introduced in ab66c1dee405. When running: > >> > >> hg = hglib.open('repo') > >> while True: > >> hg.log("max(branch('default'))") > >> > >> all filteredset instances from branch() would be cached indefinitely by the > >> @util.cachefunc annotation on the max() implementation. > > Indeed. We've cached {self: v} pair every time max() was called. > > Queued this for stable, thanks. > > > >> - @util.cachefunc > >> def min(self): > >> """return the minimum element in the set""" > >> - if self.fastasc is not None: > >> - for r in self.fastasc(): > >> - return r > >> - raise ValueError('arg is an empty sequence') > >> - return min(self) > >> - > >> - @util.cachefunc > >> + if self.fastasc is None: > >> + v = min(self) > >> + else: > >> + for v in self.fastasc(): > >> + break > >> + else: > >> + raise ValueError('arg is an empty sequence') > >> + self.min = lambda: v > >> + return v > > I slightly prefer using propertycache() and wrap it by a function, which > > is a common pattern seen in context.py, but that's just a nitpicking. > > Exactly what do you mean with wrapping the property by a function? I > don't see a clear pattern of that in context.py? Yes. What I have in mind is parents() -> self._parents for instance. > Instead, I would perhaps prefer to introduce a 'gettercache' that works > on methods that only have self as parameter (and thus can set a simple > instance method as I do here) ... or a 'methodcache' that would be like > cachefunc but store the cache on the first 'self' parameter. That also seems fine. (I'm not a big fan of overwriting methods since we're likely to create self->self cycle, but that wouldn't be the case here.)
On 10/26/2016 02:31 PM, Yuya Nishihara wrote: > On Wed, 26 Oct 2016 14:09:17 +0200, Mads Kiilerich wrote: >> Instead, I would perhaps prefer to introduce a 'gettercache' that works >> on methods that only have self as parameter (and thus can set a simple >> instance method as I do here) ... or a 'methodcache' that would be like >> cachefunc but store the cache on the first 'self' parameter. > That also seems fine. (I'm not a big fan of overwriting methods since we're > likely to create self->self cycle, but that wouldn't be the case here.) I guess it could be slightly more gc efficient to use 'lambda v=v: v' to avoid having a reference to v in the outer scope that also has self ... /Mads
Patch
diff --git a/mercurial/revset.py b/mercurial/revset.py --- a/mercurial/revset.py +++ b/mercurial/revset.py @@ -2924,23 +2924,29 @@ class abstractsmartset(object): """True if the set will iterate in topographical order""" raise NotImplementedError() - @util.cachefunc def min(self): """return the minimum element in the set""" - if self.fastasc is not None: - for r in self.fastasc(): - return r - raise ValueError('arg is an empty sequence') - return min(self) - - @util.cachefunc + if self.fastasc is None: + v = min(self) + else: + for v in self.fastasc(): + break + else: + raise ValueError('arg is an empty sequence') + self.min = lambda: v + return v + def max(self): """return the maximum element in the set""" - if self.fastdesc is not None: - for r in self.fastdesc(): - return r - raise ValueError('arg is an empty sequence') - return max(self) + if self.fastdesc is None: + return max(self) + else: + for v in self.fastdesc(): + break + else: + raise ValueError('arg is an empty sequence') + self.max = lambda: v + return v def first(self): """return the first element in the set (user iteration perspective)