Patchwork [3,of,3,v2] repoview: cache repoview class instances (issue5043)

login
register
mail settings
Submitter Gregory Szorc
Date Jan. 17, 2016, 10:18 p.m.
Message ID <3a418d20ca9d0345416d.1453069122@ubuntu-vm-main>
Download mbox | patch
Permalink /patch/12820/
State Deferred
Delegated to: Pierre-Yves David
Headers show

Comments

Gregory Szorc - Jan. 17, 2016, 10:18 p.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1453069024 28800
#      Sun Jan 17 14:17:04 2016 -0800
# Node ID 3a418d20ca9d0345416d65b82fd3f49101bcc5b2
# Parent  91969eee20a1dcd52470258164bc44528861a4c6
repoview: cache repoview class instances (issue5043)

The Python garbage collector was reporting the proxycls instances
created on every invocation of localrepository.filtered() were
uncollectable. This led to leaking memory.

We fix the leak by introducing a cache of the proxy classes. After
this patch, the garbage collector no longer reports uncollectable
objects after each iteration of `hg convert`. This likely prevents
memory leaks in a number of other operations as well (pretty much
anything relying heavily on repo.filtered() calls).

We stop short of caching the instantiated proxycls instances, which
we /should/ be able to do safely. This feels a bit too risky for
a change just before code freeze, however.
Pierre-Yves David - Jan. 18, 2016, 1:10 a.m.
On 01/17/2016 02:18 PM, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1453069024 28800
> #      Sun Jan 17 14:17:04 2016 -0800
> # Node ID 3a418d20ca9d0345416d65b82fd3f49101bcc5b2
> # Parent  91969eee20a1dcd52470258164bc44528861a4c6
> repoview: cache repoview class instances (issue5043)
>
> The Python garbage collector was reporting the proxycls instances
> created on every invocation of localrepository.filtered() were
> uncollectable. This led to leaking memory.
>
> We fix the leak by introducing a cache of the proxy classes. After
> this patch, the garbage collector no longer reports uncollectable
> objects after each iteration of `hg convert`. This likely prevents
> memory leaks in a number of other operations as well (pretty much
> anything relying heavily on repo.filtered() calls).
>
> We stop short of caching the instantiated proxycls instances, which
> we /should/ be able to do safely. This feels a bit too risky for
> a change just before code freeze, however.

I'm either missing something or the behavior of this patch is incorrect.

If the class of the repo objet changes, we won't invalidate the cache, 
and the next call to "filtered" will return an object with an outdated 
proxy class.
Gregory Szorc - Jan. 18, 2016, 1:19 a.m.
> On Jan 17, 2016, at 17:10, Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote:
> 
> 
> 
>> On 01/17/2016 02:18 PM, Gregory Szorc wrote:
>> # HG changeset patch
>> # User Gregory Szorc <gregory.szorc@gmail.com>
>> # Date 1453069024 28800
>> #      Sun Jan 17 14:17:04 2016 -0800
>> # Node ID 3a418d20ca9d0345416d65b82fd3f49101bcc5b2
>> # Parent  91969eee20a1dcd52470258164bc44528861a4c6
>> repoview: cache repoview class instances (issue5043)
>> 
>> The Python garbage collector was reporting the proxycls instances
>> created on every invocation of localrepository.filtered() were
>> uncollectable. This led to leaking memory.
>> 
>> We fix the leak by introducing a cache of the proxy classes. After
>> this patch, the garbage collector no longer reports uncollectable
>> objects after each iteration of `hg convert`. This likely prevents
>> memory leaks in a number of other operations as well (pretty much
>> anything relying heavily on repo.filtered() calls).
>> 
>> We stop short of caching the instantiated proxycls instances, which
>> we /should/ be able to do safely. This feels a bit too risky for
>> a change just before code freeze, however.
> 
> I'm either missing something or the behavior of this patch is incorrect.
> 
> If the class of the repo objet changes, we won't invalidate the cache, and the next call to "filtered" will return an object with an outdated proxy class.

That's a valid observation. But the class should be static after reposetup functions are called. And we shouldn't be obtaining a repoview before reposetup is called, right?
Pierre-Yves David - Jan. 18, 2016, 1:24 a.m.
On 01/17/2016 05:19 PM, Gregory Szorc wrote:
>
>
>> On Jan 17, 2016, at 17:10, Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote:
>>
>>
>>
>>> On 01/17/2016 02:18 PM, Gregory Szorc wrote:
>>> # HG changeset patch
>>> # User Gregory Szorc <gregory.szorc@gmail.com>
>>> # Date 1453069024 28800
>>> #      Sun Jan 17 14:17:04 2016 -0800
>>> # Node ID 3a418d20ca9d0345416d65b82fd3f49101bcc5b2
>>> # Parent  91969eee20a1dcd52470258164bc44528861a4c6
>>> repoview: cache repoview class instances (issue5043)
>>>
>>> The Python garbage collector was reporting the proxycls instances
>>> created on every invocation of localrepository.filtered() were
>>> uncollectable. This led to leaking memory.
>>>
>>> We fix the leak by introducing a cache of the proxy classes. After
>>> this patch, the garbage collector no longer reports uncollectable
>>> objects after each iteration of `hg convert`. This likely prevents
>>> memory leaks in a number of other operations as well (pretty much
>>> anything relying heavily on repo.filtered() calls).
>>>
>>> We stop short of caching the instantiated proxycls instances, which
>>> we /should/ be able to do safely. This feels a bit too risky for
>>> a change just before code freeze, however.
>>
>> I'm either missing something or the behavior of this patch is incorrect.
>>
>> If the class of the repo objet changes, we won't invalidate the cache, and the next call to "filtered" will return an object with an outdated proxy class.
>
> That's a valid observation. But the class should be static after reposetup functions are called. And we shouldn't be obtaining a repoview before reposetup is called, right?

I would not bet on class being static (and I'am pretty sure it is not).

What's make the class object uncollectable, could we not fix this instead?

(You are right in the fact that I don't think there is anything that 
would obtains a repoview before reposetup is called)

Cheers,
Gregory Szorc - Jan. 18, 2016, 1:40 a.m.
> On Jan 17, 2016, at 17:24, Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote:
> 
> 
> 
>> On 01/17/2016 05:19 PM, Gregory Szorc wrote:
>> 
>> 
>>> On Jan 17, 2016, at 17:10, Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote:
>>> 
>>> 
>>> 
>>>> On 01/17/2016 02:18 PM, Gregory Szorc wrote:
>>>> # HG changeset patch
>>>> # User Gregory Szorc <gregory.szorc@gmail.com>
>>>> # Date 1453069024 28800
>>>> #      Sun Jan 17 14:17:04 2016 -0800
>>>> # Node ID 3a418d20ca9d0345416d65b82fd3f49101bcc5b2
>>>> # Parent  91969eee20a1dcd52470258164bc44528861a4c6
>>>> repoview: cache repoview class instances (issue5043)
>>>> 
>>>> The Python garbage collector was reporting the proxycls instances
>>>> created on every invocation of localrepository.filtered() were
>>>> uncollectable. This led to leaking memory.
>>>> 
>>>> We fix the leak by introducing a cache of the proxy classes. After
>>>> this patch, the garbage collector no longer reports uncollectable
>>>> objects after each iteration of `hg convert`. This likely prevents
>>>> memory leaks in a number of other operations as well (pretty much
>>>> anything relying heavily on repo.filtered() calls).
>>>> 
>>>> We stop short of caching the instantiated proxycls instances, which
>>>> we /should/ be able to do safely. This feels a bit too risky for
>>>> a change just before code freeze, however.
>>> 
>>> I'm either missing something or the behavior of this patch is incorrect.
>>> 
>>> If the class of the repo objet changes, we won't invalidate the cache, and the next call to "filtered" will return an object with an outdated proxy class.
>> 
>> That's a valid observation. But the class should be static after reposetup functions are called. And we shouldn't be obtaining a repoview before reposetup is called, right?
> 
> I would not bet on class being static (and I'am pretty sure it is not).
> 
> What's make the class object uncollectable, could we not fix this instead?

I, uh, have no idea. Either classes aren't garbage collected or the ref to repo.__class__ is somehow making it not collectible. This is a minor leak compared to the first 2 patches. So if you could queue those, it would be appreciated.

> 
> (You are right in the fact that I don't think there is anything that would obtains a repoview before reposetup is called)
> 
> Cheers,
> 
> -- 
> Pierre-Yves David
Simon King - Jan. 18, 2016, 9:42 a.m.
On Mon, Jan 18, 2016 at 1:40 AM, Gregory Szorc <gregory.szorc@gmail.com>
wrote:

>
>
> > On Jan 17, 2016, at 17:24, Pierre-Yves David <
> pierre-yves.david@ens-lyon.org> wrote:
> >
> >
> >
> >> On 01/17/2016 05:19 PM, Gregory Szorc wrote:
> >>
> >>
> >>> On Jan 17, 2016, at 17:10, Pierre-Yves David <
> pierre-yves.david@ens-lyon.org> wrote:
> >>>
> >>>
> >>>
> >>>> On 01/17/2016 02:18 PM, Gregory Szorc wrote:
> >>>> # HG changeset patch
> >>>> # User Gregory Szorc <gregory.szorc@gmail.com>
> >>>> # Date 1453069024 28800
> >>>> #      Sun Jan 17 14:17:04 2016 -0800
> >>>> # Node ID 3a418d20ca9d0345416d65b82fd3f49101bcc5b2
> >>>> # Parent  91969eee20a1dcd52470258164bc44528861a4c6
> >>>> repoview: cache repoview class instances (issue5043)
> >>>>
> >>>> The Python garbage collector was reporting the proxycls instances
> >>>> created on every invocation of localrepository.filtered() were
> >>>> uncollectable. This led to leaking memory.
> >>>>
> >>>> We fix the leak by introducing a cache of the proxy classes. After
> >>>> this patch, the garbage collector no longer reports uncollectable
> >>>> objects after each iteration of `hg convert`. This likely prevents
> >>>> memory leaks in a number of other operations as well (pretty much
> >>>> anything relying heavily on repo.filtered() calls).
> >>>>
> >>>> We stop short of caching the instantiated proxycls instances, which
> >>>> we /should/ be able to do safely. This feels a bit too risky for
> >>>> a change just before code freeze, however.
> >>>
> >>> I'm either missing something or the behavior of this patch is
> incorrect.
> >>>
> >>> If the class of the repo objet changes, we won't invalidate the cache,
> and the next call to "filtered" will return an object with an outdated
> proxy class.
> >>
> >> That's a valid observation. But the class should be static after
> reposetup functions are called. And we shouldn't be obtaining a repoview
> before reposetup is called, right?
> >
> > I would not bet on class being static (and I'am pretty sure it is not).
> >
> > What's make the class object uncollectable, could we not fix this
> instead?
>
> I, uh, have no idea. Either classes aren't garbage collected or the ref to
> repo.__class__ is somehow making it not collectible. This is a minor leak
> compared to the first 2 patches. So if you could queue those, it would be
> appreciated.
>
>
I think classes *are* garbage collected, but not as soon as the obvious
reference is dropped (presumably there's a reference cycle):

#######
import sys
print sys.version

class A(object):
    pass

class B(A):
    pass

print A.__subclasses__()

del B
print A.__subclasses__()

import gc
gc.collect()
print A.__subclasses__()
#######


Output:
2.7.10 (default, Oct 28 2015, 11:24:41)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-44)]
[<class '__main__.B'>]
[<class '__main__.B'>]
[]

Simon
Yuya Nishihara - Jan. 18, 2016, 2:58 p.m.
On Mon, 18 Jan 2016 09:42:10 +0000, Simon King wrote:
> > >>>> On 01/17/2016 02:18 PM, Gregory Szorc wrote:
> > >>>> # HG changeset patch
> > >>>> # User Gregory Szorc <gregory.szorc@gmail.com>
> > >>>> # Date 1453069024 28800
> > >>>> #      Sun Jan 17 14:17:04 2016 -0800
> > >>>> # Node ID 3a418d20ca9d0345416d65b82fd3f49101bcc5b2
> > >>>> # Parent  91969eee20a1dcd52470258164bc44528861a4c6
> > >>>> repoview: cache repoview class instances (issue5043)
> > >>>>
> > >>>> The Python garbage collector was reporting the proxycls instances
> > >>>> created on every invocation of localrepository.filtered() were
> > >>>> uncollectable. This led to leaking memory.
> > >>>>
> > >>>> We fix the leak by introducing a cache of the proxy classes. After
> > >>>> this patch, the garbage collector no longer reports uncollectable
> > >>>> objects after each iteration of `hg convert`. This likely prevents
> > >>>> memory leaks in a number of other operations as well (pretty much
> > >>>> anything relying heavily on repo.filtered() calls).
> > >>>>
> > >>>> We stop short of caching the instantiated proxycls instances, which
> > >>>> we /should/ be able to do safely. This feels a bit too risky for
> > >>>> a change just before code freeze, however.
> > >>>
> > >>> I'm either missing something or the behavior of this patch is
> > incorrect.
> > >>>
> > >>> If the class of the repo objet changes, we won't invalidate the cache,
> > and the next call to "filtered" will return an object with an outdated
> > proxy class.
> > >>
> > >> That's a valid observation. But the class should be static after
> > reposetup functions are called. And we shouldn't be obtaining a repoview
> > before reposetup is called, right?
> > >
> > > I would not bet on class being static (and I'am pretty sure it is not).

Can't it be a cache keyed by repo.__class__ ?

> > > What's make the class object uncollectable, could we not fix this
> > instead?
> >
> > I, uh, have no idea. Either classes aren't garbage collected or the ref to
> > repo.__class__ is somehow making it not collectible. This is a minor leak
> > compared to the first 2 patches. So if you could queue those, it would be
> > appreciated.
> >
> >
> I think classes *are* garbage collected, but not as soon as the obvious
> reference is dropped (presumably there's a reference cycle):

Perhaps you are right.

import gc
from mercurial import hg, repoview, ui as uimod

ui = uimod.ui()
repo = hg.repository(ui)
for x in xrange(100):
    #gc.collect()
    repo.unfiltered().filtered('visible')
    print len(repoview.repoview.__subclasses__())

Patch

diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -361,16 +361,18 @@  class localrepository(object):
         # - new obsolescence marker,
         # - working directory parent change,
         # - bookmark changes
         self.filteredrevcache = {}
 
         # generic mapping between names and nodes
         self.names = namespaces.namespaces()
 
+        self._repoviewfn = repoview.repoviewcache(weakref.ref(self))
+
     def close(self):
         self._writecaches()
 
     def _writecaches(self):
         if self._revbranchcache:
             self._revbranchcache.write()
 
     def _restrictcapabilities(self, caps):
@@ -448,21 +450,17 @@  class localrepository(object):
     def unfiltered(self):
         """Return unfiltered version of the repository
 
         Intended to be overwritten by filtered repo."""
         return self
 
     def filtered(self, name):
         """Return a filtered version of a repository"""
-        # build a new class with the mixin and the current class
-        # (possibly subclass of the repo)
-        class proxycls(repoview.repoview, self.unfiltered().__class__):
-            pass
-        return proxycls(self, name)
+        return self._repoviewfn(name)
 
     @repofilecache('bookmarks', 'bookmarks.current')
     def _bookmarks(self):
         return bookmarks.bmstore(self)
 
     @property
     def _activebookmark(self):
         return self._bookmarks.active
diff --git a/mercurial/repoview.py b/mercurial/repoview.py
--- a/mercurial/repoview.py
+++ b/mercurial/repoview.py
@@ -339,8 +339,20 @@  class repoview(object):
         return delattr(self._unfilteredrepo, attr)
 
     # The `requirements` attribute is initialized during __init__. But
     # __getattr__ won't be called as it also exists on the class. We need
     # explicit forwarding to main repo here
     @property
     def requirements(self):
         return self._unfilteredrepo.requirements
+
+def repoviewcache(reporef):
+    cache = {}
+
+    def getview(filtername):
+        if filtername not in cache:
+            class proxycls(repoview, reporef().__class__):
+                pass
+            cache[filtername] = proxycls
+        return cache[filtername](reporef(), filtername)
+
+    return getview
diff --git a/mercurial/statichttprepo.py b/mercurial/statichttprepo.py
--- a/mercurial/statichttprepo.py
+++ b/mercurial/statichttprepo.py
@@ -8,25 +8,27 @@ 
 # GNU General Public License version 2 or any later version.
 
 from __future__ import absolute_import
 
 import errno
 import os
 import urllib
 import urllib2
+import weakref
 
 from .i18n import _
 from . import (
     byterange,
     changelog,
     error,
     localrepo,
     manifest,
     namespaces,
+    repoview,
     scmutil,
     store,
     url,
     util,
 )
 
 class httprangereader(object):
     def __init__(self, url, opener):
@@ -158,16 +160,17 @@  class statichttprepository(localrepo.loc
         self.changelog = changelog.changelog(self.svfs)
         self._tags = None
         self.nodetagscache = None
         self._branchcaches = {}
         self._revbranchcache = None
         self.encodepats = None
         self.decodepats = None
         self._transref = None
+        self._repoviewfn = repoview.repoviewcache(weakref.ref(self))
 
     def _restrictcapabilities(self, caps):
         caps = super(statichttprepository, self)._restrictcapabilities(caps)
         return caps.difference(["pushkey"])
 
     def url(self):
         return self._url