Patchwork [rfc] rfc: call gc at exit of mercurial

login
register
mail settings
Submitter Pierre-Yves David
Date April 5, 2016, 12:21 a.m.
Message ID <570304EF.2050608@ens-lyon.org>
Download mbox | patch
Permalink /patch/14358/
State Not Applicable
Headers show

Comments

Pierre-Yves David - April 5, 2016, 12:21 a.m.
TL;DR; committer, pypy skip garbage collection at process exit, not call 
__del__ in this case. This seems a divergence from cpython standard. Are 
we okay with that?

More details below.

On 04/03/2016 10:51 AM, Gregory Szorc wrote:
> On Fri, Apr 1, 2016 at 2:13 AM, Pierre-Yves David
> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
> wrote:
>
>
>
>     On 04/01/2016 12:10 AM, Maciej Fijalkowski wrote:
>
>         On Fri, Apr 1, 2016 at 9:09 AM, Gregory Szorc
>         <gregory.szorc@gmail.com <mailto:gregory.szorc@gmail.com>> wrote:
>
>
>
>                 On Mar 31, 2016, at 23:58, Maciej Fijalkowski
>                 <fijall@gmail.com <mailto:fijall@gmail.com>> wrote:
>
>                 # HG changeset patch
>                 # User Maciej Fijalkowski <fijall@gmail.com
>                 <mailto:fijall@gmail.com>>
>                 # Date 1459493914 -7200
>                 #      Fri Apr 01 08:58:34 2016 +0200
>                 # Node ID 66b3737b62635691b5a205dafc80e640880b77ca
>                 # Parent  9ca7854b963295b8b56d276b903623ac8277f18d
>                 rfc: call gc at exit of mercurial
>
>                 This is not a proper patch, just a visualization of a
>                 hack I did.
>                 Locks have a __del__, which test-devel-warnings rely on
>                 being
>                 called at the exit of a program. Locks also have
>                 __exit__ which is
>                 a more proper way to handle locks, but that test does
>                 not respect it.
>                 I would like to know what's the proper way of handling
>                 this - should
>                 I fix test, should I just call the GC (which will make
>                 the command
>                 line tool slower) or what's the proposed solution
>
>
>             __del__ should be purged and banned from the code base
>             because it undermines GC and can lead to memory leaks.
>
>             For holding on to temporary resources, context managers
>             (preferred) or try..finally should be used.
>
>
>         So what do we do with lock.__del__ and how tests rely on it?
>
>
>     Can we get details about this test failure and why it relies on it?
>
>
> The existence of lock.__del__ is to catch bugs where people forget to
> call lock.release. Unfortunately, the presence of __del__ does bad
> things for garbage collection. So, we want to have a mechanism to detect
> failure to call lock.release but it can't use __del__. What can we do?

It seems like no standard code path rely on this __del__ and it only 
exist as a double safety. Moreover, the __del__ pattern create issue for 
gc in case of cycle and we are quite careful not to create cycle in our 
low level object.

We seems to have 2 code path that trigger this safeguard in tests. On is 
the devel warning on-purpose-clownly code and the second one is an 
extensions enforcing a hard crash to test fncache recovery.


About the specific issue here:

Pypy does not automatically garbage object when reference count hit 
zero. It also skip the garbase collecting (and __del__ call) when the 
programme exit. That seems like a divergence from the documented cpython 
behavior.

As we are not supposed to rely on this behavior, we might consider that 
Mercurial will be fine with this behavior. And in practice the only code 
broken by this lack of __del__ call is one where the code is especially 
wrong on purpose. This test is not testing for the __del__ call or any 
other implicite rollback of the transaction. So we can just call it 
directly.

If we are fine with pypy not running any __del__ the following patch is 
probably all we want. I've CCed the other committers for opinions.

    >
Maciej Fijalkowski - April 5, 2016, 5:31 a.m.
class A(object):
    def __del__(self):
        print "del"

class B(object):
    pass

b = B()
b.b = b
b.a = A()


This example does not call __del__ in CPython either.

The __del__ is not guaranteed to be called - that's why there is a
painful module finalization procedure where CPython is trying to call
"as much as possible", but there are still no guarantees. If you add
del b; gc.collect() you will see "del" printed. Of course this
involves a cycle, but cycles can come in ways that you don't expect
them and PyPy simply says "everything is GCed". I think it's very much
in line with what python-dev thinks.

On Tue, Apr 5, 2016 at 2:21 AM, Pierre-Yves David
<pierre-yves.david@ens-lyon.org> wrote:
> TL;DR; committer, pypy skip garbage collection at process exit, not call
> __del__ in this case. This seems a divergence from cpython standard. Are we
> okay with that?
>
> More details below.
>
> On 04/03/2016 10:51 AM, Gregory Szorc wrote:
>>
>> On Fri, Apr 1, 2016 at 2:13 AM, Pierre-Yves David
>> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
>> wrote:
>>
>>
>>
>>     On 04/01/2016 12:10 AM, Maciej Fijalkowski wrote:
>>
>>         On Fri, Apr 1, 2016 at 9:09 AM, Gregory Szorc
>>         <gregory.szorc@gmail.com <mailto:gregory.szorc@gmail.com>> wrote:
>>
>>
>>
>>                 On Mar 31, 2016, at 23:58, Maciej Fijalkowski
>>                 <fijall@gmail.com <mailto:fijall@gmail.com>> wrote:
>>
>>                 # HG changeset patch
>>                 # User Maciej Fijalkowski <fijall@gmail.com
>>                 <mailto:fijall@gmail.com>>
>>
>>                 # Date 1459493914 -7200
>>                 #      Fri Apr 01 08:58:34 2016 +0200
>>                 # Node ID 66b3737b62635691b5a205dafc80e640880b77ca
>>                 # Parent  9ca7854b963295b8b56d276b903623ac8277f18d
>>                 rfc: call gc at exit of mercurial
>>
>>                 This is not a proper patch, just a visualization of a
>>                 hack I did.
>>                 Locks have a __del__, which test-devel-warnings rely on
>>                 being
>>                 called at the exit of a program. Locks also have
>>                 __exit__ which is
>>                 a more proper way to handle locks, but that test does
>>                 not respect it.
>>                 I would like to know what's the proper way of handling
>>                 this - should
>>                 I fix test, should I just call the GC (which will make
>>                 the command
>>                 line tool slower) or what's the proposed solution
>>
>>
>>             __del__ should be purged and banned from the code base
>>             because it undermines GC and can lead to memory leaks.
>>
>>             For holding on to temporary resources, context managers
>>             (preferred) or try..finally should be used.
>>
>>
>>         So what do we do with lock.__del__ and how tests rely on it?
>>
>>
>>     Can we get details about this test failure and why it relies on it?
>>
>>
>> The existence of lock.__del__ is to catch bugs where people forget to
>> call lock.release. Unfortunately, the presence of __del__ does bad
>> things for garbage collection. So, we want to have a mechanism to detect
>> failure to call lock.release but it can't use __del__. What can we do?
>
>
> It seems like no standard code path rely on this __del__ and it only exist
> as a double safety. Moreover, the __del__ pattern create issue for gc in
> case of cycle and we are quite careful not to create cycle in our low level
> object.
>
> We seems to have 2 code path that trigger this safeguard in tests. On is the
> devel warning on-purpose-clownly code and the second one is an extensions
> enforcing a hard crash to test fncache recovery.
>
>
> About the specific issue here:
>
> Pypy does not automatically garbage object when reference count hit zero. It
> also skip the garbase collecting (and __del__ call) when the programme exit.
> That seems like a divergence from the documented cpython behavior.
>
> As we are not supposed to rely on this behavior, we might consider that
> Mercurial will be fine with this behavior. And in practice the only code
> broken by this lack of __del__ call is one where the code is especially
> wrong on purpose. This test is not testing for the __del__ call or any other
> implicite rollback of the transaction. So we can just call it directly.
>
> If we are fine with pypy not running any __del__ the following patch is
> probably all we want. I've CCed the other committers for opinions.
>
> --- a/tests/test-devel-warnings.t
> +++ b/tests/test-devel-warnings.t
> @@ -9,10 +9,11 @@
>    > command = cmdutil.command(cmdtable)
>    >
>    > @command('buggylocking', [], '')
>    > def buggylocking(ui, repo):
>    >     tr = repo.transaction('buggy')
> +  >     tr.release() # make sure we rollback the transaction as pypy skip
> the __del__
>    >     lo = repo.lock()
>    >     wl = repo.wlock()
>    >     wl.release()
>    >     lo.release()
>    >
>
>
> --
> Pierre-Yves David
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Pierre-Yves David - April 5, 2016, 6:36 a.m.
On 04/04/2016 10:31 PM, Maciej Fijalkowski wrote:
> class A(object):
>      def __del__(self):
>          print "del"
>
> class B(object):
>      pass
>
> b = B()
> b.b = b
> b.a = A()
>
>
> This example does not call __del__ in CPython either.
>
> The __del__ is not guaranteed to be called - that's why there is a
> painful module finalization procedure where CPython is trying to call
> "as much as possible", but there are still no guarantees. If you add
> del b; gc.collect() you will see "del" printed. Of course this
> involves a cycle, but cycles can come in ways that you don't expect
> them and PyPy simply says "everything is GCed". I think it's very much
> in line with what python-dev thinks.

Which is why we have __del__ in very few object and we deploy massive 
effort to ensure their don't get caught in cycle and mostly succeeding 
at this. (Kind of the same we put a lot of effort into making sure 
__del__ are never really called but keep them as double safety).

So in the case we care about (no cycle) Cpython would call our __del__, 
right?
Maciej Fijalkowski - April 5, 2016, 6:37 a.m.
On Tue, Apr 5, 2016 at 8:36 AM, Pierre-Yves David
<pierre-yves.david@ens-lyon.org> wrote:
>
>
> On 04/04/2016 10:31 PM, Maciej Fijalkowski wrote:
>>
>> class A(object):
>>      def __del__(self):
>>          print "del"
>>
>> class B(object):
>>      pass
>>
>> b = B()
>> b.b = b
>> b.a = A()
>>
>>
>> This example does not call __del__ in CPython either.
>>
>> The __del__ is not guaranteed to be called - that's why there is a
>> painful module finalization procedure where CPython is trying to call
>> "as much as possible", but there are still no guarantees. If you add
>> del b; gc.collect() you will see "del" printed. Of course this
>> involves a cycle, but cycles can come in ways that you don't expect
>> them and PyPy simply says "everything is GCed". I think it's very much
>> in line with what python-dev thinks.
>
>
> Which is why we have __del__ in very few object and we deploy massive effort
> to ensure their don't get caught in cycle and mostly succeeding at this.
> (Kind of the same we put a lot of effort into making sure __del__ are never
> really called but keep them as double safety).
>
> So in the case we care about (no cycle) Cpython would call our __del__,
> right?
>
> --
> Pierre-Yves David

Yes, but I would argue you can create cycles without knowing. E.g.

def f():
    try:
       some_stuff
    except:
       x = sys.exc_info()

creates a cycle. There are also ways to create cycles with passing
global functions around etc.
Pierre-Yves David - April 5, 2016, 7:10 a.m.
On 04/04/2016 11:37 PM, Maciej Fijalkowski wrote:
> On Tue, Apr 5, 2016 at 8:36 AM, Pierre-Yves David
> <pierre-yves.david@ens-lyon.org> wrote:
>>
>>
>> On 04/04/2016 10:31 PM, Maciej Fijalkowski wrote:
>>>
>>> class A(object):
>>>       def __del__(self):
>>>           print "del"
>>>
>>> class B(object):
>>>       pass
>>>
>>> b = B()
>>> b.b = b
>>> b.a = A()
>>>
>>>
>>> This example does not call __del__ in CPython either.
>>>
>>> The __del__ is not guaranteed to be called - that's why there is a
>>> painful module finalization procedure where CPython is trying to call
>>> "as much as possible", but there are still no guarantees. If you add
>>> del b; gc.collect() you will see "del" printed. Of course this
>>> involves a cycle, but cycles can come in ways that you don't expect
>>> them and PyPy simply says "everything is GCed". I think it's very much
>>> in line with what python-dev thinks.
>>
>>
>> Which is why we have __del__ in very few object and we deploy massive effort
>> to ensure their don't get caught in cycle and mostly succeeding at this.
>> (Kind of the same we put a lot of effort into making sure __del__ are never
>> really called but keep them as double safety).
>>
>> So in the case we care about (no cycle) Cpython would call our __del__,
>> right?
>>
>> --
>> Pierre-Yves David
>
> Yes, but I would argue you can create cycles without knowing. E.g.
>
> def f():
>      try:
>         some_stuff
>      except:
>         x = sys.exc_info()
>
> creates a cycle. There are also ways to create cycles with passing
> global functions around etc.

<situation-summary>

Sure we can always makes more mistake (and we sometimes do). But in most 
of the rare case where __del__ catch one of this mistake, we don't have 
an extra mistake that prevent it from catching it! (Okay, I admit, I 
make this sentence extra confusing on purpose).

Simpler version (with cpython):
1) The official way is not to call __del__, we don't rely on it for well 
written code, (this kind of greg's point)
2) However, in rare case, the situation is extra buggy and __del__ is 
useful, (this is kind of my point),
3) However, in rare² care, the situation is extra² buggy and a cycle 
prevent __del__ to be called.

The bad new is that Pypy mostly remove (2), and make (3) replace most of 
it occurrence.

</situation-summary>

However, the good news is that we don't corrupt or loose anything in 
(2). We just leave the repo into an unrecovered state that people can 
recover with `hg recover`, so buggy code/situation expose slightly more 
anoying behavior. This is probably fine. (There might be some super old 
third party extension that will need to update itself).

someone should have a look at the other __del__ behavior and see if they 
have as "harmless" behavior (eg: lock.__del__ would leave the repository 
locked, but next run will remove that lock at the process is dead)

Cheers,
Maciej Fijalkowski - April 5, 2016, 7:24 a.m.
Right, that's a good summary. I was merely pointing out that from
python perspective this behavior is ok (not whether from mercurial
perspective it's ok  or not - it's up to you to decided :-)

On Tue, Apr 5, 2016 at 9:10 AM, Pierre-Yves David
<pierre-yves.david@ens-lyon.org> wrote:
>
>
> On 04/04/2016 11:37 PM, Maciej Fijalkowski wrote:
>>
>> On Tue, Apr 5, 2016 at 8:36 AM, Pierre-Yves David
>> <pierre-yves.david@ens-lyon.org> wrote:
>>>
>>>
>>>
>>> On 04/04/2016 10:31 PM, Maciej Fijalkowski wrote:
>>>>
>>>>
>>>> class A(object):
>>>>       def __del__(self):
>>>>           print "del"
>>>>
>>>> class B(object):
>>>>       pass
>>>>
>>>> b = B()
>>>> b.b = b
>>>> b.a = A()
>>>>
>>>>
>>>> This example does not call __del__ in CPython either.
>>>>
>>>> The __del__ is not guaranteed to be called - that's why there is a
>>>> painful module finalization procedure where CPython is trying to call
>>>> "as much as possible", but there are still no guarantees. If you add
>>>> del b; gc.collect() you will see "del" printed. Of course this
>>>> involves a cycle, but cycles can come in ways that you don't expect
>>>> them and PyPy simply says "everything is GCed". I think it's very much
>>>> in line with what python-dev thinks.
>>>
>>>
>>>
>>> Which is why we have __del__ in very few object and we deploy massive
>>> effort
>>> to ensure their don't get caught in cycle and mostly succeeding at this.
>>> (Kind of the same we put a lot of effort into making sure __del__ are
>>> never
>>> really called but keep them as double safety).
>>>
>>> So in the case we care about (no cycle) Cpython would call our __del__,
>>> right?
>>>
>>> --
>>> Pierre-Yves David
>>
>>
>> Yes, but I would argue you can create cycles without knowing. E.g.
>>
>> def f():
>>      try:
>>         some_stuff
>>      except:
>>         x = sys.exc_info()
>>
>> creates a cycle. There are also ways to create cycles with passing
>> global functions around etc.
>
>
> <situation-summary>
>
> Sure we can always makes more mistake (and we sometimes do). But in most of
> the rare case where __del__ catch one of this mistake, we don't have an
> extra mistake that prevent it from catching it! (Okay, I admit, I make this
> sentence extra confusing on purpose).
>
> Simpler version (with cpython):
> 1) The official way is not to call __del__, we don't rely on it for well
> written code, (this kind of greg's point)
> 2) However, in rare case, the situation is extra buggy and __del__ is
> useful, (this is kind of my point),
> 3) However, in rare² care, the situation is extra² buggy and a cycle prevent
> __del__ to be called.
>
> The bad new is that Pypy mostly remove (2), and make (3) replace most of it
> occurrence.
>
> </situation-summary>
>
> However, the good news is that we don't corrupt or loose anything in (2). We
> just leave the repo into an unrecovered state that people can recover with
> `hg recover`, so buggy code/situation expose slightly more anoying behavior.
> This is probably fine. (There might be some super old third party extension
> that will need to update itself).
>
> someone should have a look at the other __del__ behavior and see if they
> have as "harmless" behavior (eg: lock.__del__ would leave the repository
> locked, but next run will remove that lock at the process is dead)
>
> Cheers,
>
> --
> Pierre-Yves David
Yuya Nishihara - April 5, 2016, 1:59 p.m.
On Tue, 5 Apr 2016 00:10:17 -0700, Pierre-Yves David wrote:
> However, the good news is that we don't corrupt or loose anything in 
> (2). We just leave the repo into an unrecovered state that people can 
> recover with `hg recover`, so buggy code/situation expose slightly more 
> anoying behavior. This is probably fine. (There might be some super old 
> third party extension that will need to update itself).
> 
> someone should have a look at the other __del__ behavior and see if they 
> have as "harmless" behavior (eg: lock.__del__ would leave the repository 
> locked, but next run will remove that lock at the process is dead)

A failure case of atomictempfile would leave garbage, which is why I've
accepted gc.collect() patch for chg, 5346e9b910fc. But leaving garbage could
be considered "harmless".
Gregory Szorc - May 4, 2016, 6:51 a.m.
> On Apr 4, 2016, at 23:37, Maciej Fijalkowski <fijall@gmail.com> wrote:
> 
> On Tue, Apr 5, 2016 at 8:36 AM, Pierre-Yves David
> <pierre-yves.david@ens-lyon.org> wrote:
>> 
>> 
>>> On 04/04/2016 10:31 PM, Maciej Fijalkowski wrote:
>>> 
>>> class A(object):
>>>     def __del__(self):
>>>         print "del"
>>> 
>>> class B(object):
>>>     pass
>>> 
>>> b = B()
>>> b.b = b
>>> b.a = A()
>>> 
>>> 
>>> This example does not call __del__ in CPython either.
>>> 
>>> The __del__ is not guaranteed to be called - that's why there is a
>>> painful module finalization procedure where CPython is trying to call
>>> "as much as possible", but there are still no guarantees. If you add
>>> del b; gc.collect() you will see "del" printed. Of course this
>>> involves a cycle, but cycles can come in ways that you don't expect
>>> them and PyPy simply says "everything is GCed". I think it's very much
>>> in line with what python-dev thinks.
>> 
>> 
>> Which is why we have __del__ in very few object and we deploy massive effort
>> to ensure their don't get caught in cycle and mostly succeeding at this.
>> (Kind of the same we put a lot of effort into making sure __del__ are never
>> really called but keep them as double safety).
>> 
>> So in the case we care about (no cycle) Cpython would call our __del__,
>> right?
>> 
>> --
>> Pierre-Yves David
> 
> Yes, but I would argue you can create cycles without knowing. E.g.
> 
> def f():
>    try:
>       some_stuff
>    except:
>       x = sys.exc_info()
> 
> creates a cycle. There are also ways to create cycles with passing
> global functions around etc.

This.

We have plenty of cycles in our code. We just don't notice them very often because "hg" processes are short-lived. And what's worse is we don't know we're introducing them unless we go looking for them, often after someone complains about a leak on a large repo.

If you want to create cycles and leak memory, I recommend "hg convert" on thousands of revisions with extensions and hooks installed. Or start a WSGI server.

One of the reasons I want to get Python 3 support is so we can use its tracemalloc module to help debug leaks. The Python 2 tools for finding cycles and leaks (such as guppy and heappy) are a bit harder to use and to integrate into our testing harness.
Maciej Fijalkowski - May 30, 2016, 7:38 a.m.
On Wed, May 4, 2016 at 8:51 AM, Gregory Szorc <gregory.szorc@gmail.com> wrote:
>
>
>> On Apr 4, 2016, at 23:37, Maciej Fijalkowski <fijall@gmail.com> wrote:
>>
>> On Tue, Apr 5, 2016 at 8:36 AM, Pierre-Yves David
>> <pierre-yves.david@ens-lyon.org> wrote:
>>>
>>>
>>>> On 04/04/2016 10:31 PM, Maciej Fijalkowski wrote:
>>>>
>>>> class A(object):
>>>>     def __del__(self):
>>>>         print "del"
>>>>
>>>> class B(object):
>>>>     pass
>>>>
>>>> b = B()
>>>> b.b = b
>>>> b.a = A()
>>>>
>>>>
>>>> This example does not call __del__ in CPython either.
>>>>
>>>> The __del__ is not guaranteed to be called - that's why there is a
>>>> painful module finalization procedure where CPython is trying to call
>>>> "as much as possible", but there are still no guarantees. If you add
>>>> del b; gc.collect() you will see "del" printed. Of course this
>>>> involves a cycle, but cycles can come in ways that you don't expect
>>>> them and PyPy simply says "everything is GCed". I think it's very much
>>>> in line with what python-dev thinks.
>>>
>>>
>>> Which is why we have __del__ in very few object and we deploy massive effort
>>> to ensure their don't get caught in cycle and mostly succeeding at this.
>>> (Kind of the same we put a lot of effort into making sure __del__ are never
>>> really called but keep them as double safety).
>>>
>>> So in the case we care about (no cycle) Cpython would call our __del__,
>>> right?
>>>
>>> --
>>> Pierre-Yves David
>>
>> Yes, but I would argue you can create cycles without knowing. E.g.
>>
>> def f():
>>    try:
>>       some_stuff
>>    except:
>>       x = sys.exc_info()
>>
>> creates a cycle. There are also ways to create cycles with passing
>> global functions around etc.
>
> This.
>
> We have plenty of cycles in our code. We just don't notice them very often because "hg" processes are short-lived. And what's worse is we don't know we're introducing them unless we go looking for them, often after someone complains about a leak on a large repo.
>
> If you want to create cycles and leak memory, I recommend "hg convert" on thousands of revisions with extensions and hooks installed. Or start a WSGI server.
>
> One of the reasons I want to get Python 3 support is so we can use its tracemalloc module to help debug leaks. The Python 2 tools for finding cycles and leaks (such as guppy and heappy) are a bit harder to use and to integrate into our testing harness.

I thought the consensus was that yes, there are cycles, no, they're
not a problem because we need to close files/resources with context
managers anyway. Why you would want to "debug" cycles?
Gregory Szorc - June 1, 2016, 2:32 a.m.
On Mon, May 30, 2016 at 12:38 AM, Maciej Fijalkowski <fijall@gmail.com>
wrote:

> On Wed, May 4, 2016 at 8:51 AM, Gregory Szorc <gregory.szorc@gmail.com>
> wrote:
> >
> >
> >> On Apr 4, 2016, at 23:37, Maciej Fijalkowski <fijall@gmail.com> wrote:
> >>
> >> On Tue, Apr 5, 2016 at 8:36 AM, Pierre-Yves David
> >> <pierre-yves.david@ens-lyon.org> wrote:
> >>>
> >>>
> >>>> On 04/04/2016 10:31 PM, Maciej Fijalkowski wrote:
> >>>>
> >>>> class A(object):
> >>>>     def __del__(self):
> >>>>         print "del"
> >>>>
> >>>> class B(object):
> >>>>     pass
> >>>>
> >>>> b = B()
> >>>> b.b = b
> >>>> b.a = A()
> >>>>
> >>>>
> >>>> This example does not call __del__ in CPython either.
> >>>>
> >>>> The __del__ is not guaranteed to be called - that's why there is a
> >>>> painful module finalization procedure where CPython is trying to call
> >>>> "as much as possible", but there are still no guarantees. If you add
> >>>> del b; gc.collect() you will see "del" printed. Of course this
> >>>> involves a cycle, but cycles can come in ways that you don't expect
> >>>> them and PyPy simply says "everything is GCed". I think it's very much
> >>>> in line with what python-dev thinks.
> >>>
> >>>
> >>> Which is why we have __del__ in very few object and we deploy massive
> effort
> >>> to ensure their don't get caught in cycle and mostly succeeding at
> this.
> >>> (Kind of the same we put a lot of effort into making sure __del__ are
> never
> >>> really called but keep them as double safety).
> >>>
> >>> So in the case we care about (no cycle) Cpython would call our __del__,
> >>> right?
> >>>
> >>> --
> >>> Pierre-Yves David
> >>
> >> Yes, but I would argue you can create cycles without knowing. E.g.
> >>
> >> def f():
> >>    try:
> >>       some_stuff
> >>    except:
> >>       x = sys.exc_info()
> >>
> >> creates a cycle. There are also ways to create cycles with passing
> >> global functions around etc.
> >
> > This.
> >
> > We have plenty of cycles in our code. We just don't notice them very
> often because "hg" processes are short-lived. And what's worse is we don't
> know we're introducing them unless we go looking for them, often after
> someone complains about a leak on a large repo.
> >
> > If you want to create cycles and leak memory, I recommend "hg convert"
> on thousands of revisions with extensions and hooks installed. Or start a
> WSGI server.
> >
> > One of the reasons I want to get Python 3 support is so we can use its
> tracemalloc module to help debug leaks. The Python 2 tools for finding
> cycles and leaks (such as guppy and heappy) are a bit harder to use and to
> integrate into our testing harness.
>
> I thought the consensus was that yes, there are cycles, no, they're
> not a problem because we need to close files/resources with context
> managers anyway. Why you would want to "debug" cycles?
>

Cycles and files/resources are separate things. But both involves leaks:
you can leak file descriptors and sockets by not using context managers or
"finally" blocks and you can leak memory by introducing cycles.

You would want to "debug" cycles to figure out why your long-running
Mercurial process is leaking memory and OOMing. I've had to deal with
Mercurial leaking memory in both the WSGI server and in conversion tools,
like `hg convert` and hg-git. A well-behaved application should not leak
memory nor resources.
Maciej Fijalkowski - June 6, 2016, 7:38 a.m.
On Wed, Jun 1, 2016 at 4:32 AM, Gregory Szorc <gregory.szorc@gmail.com> wrote:
> On Mon, May 30, 2016 at 12:38 AM, Maciej Fijalkowski <fijall@gmail.com>
> wrote:
>>
>> On Wed, May 4, 2016 at 8:51 AM, Gregory Szorc <gregory.szorc@gmail.com>
>> wrote:
>> >
>> >
>> >> On Apr 4, 2016, at 23:37, Maciej Fijalkowski <fijall@gmail.com> wrote:
>> >>
>> >> On Tue, Apr 5, 2016 at 8:36 AM, Pierre-Yves David
>> >> <pierre-yves.david@ens-lyon.org> wrote:
>> >>>
>> >>>
>> >>>> On 04/04/2016 10:31 PM, Maciej Fijalkowski wrote:
>> >>>>
>> >>>> class A(object):
>> >>>>     def __del__(self):
>> >>>>         print "del"
>> >>>>
>> >>>> class B(object):
>> >>>>     pass
>> >>>>
>> >>>> b = B()
>> >>>> b.b = b
>> >>>> b.a = A()
>> >>>>
>> >>>>
>> >>>> This example does not call __del__ in CPython either.
>> >>>>
>> >>>> The __del__ is not guaranteed to be called - that's why there is a
>> >>>> painful module finalization procedure where CPython is trying to call
>> >>>> "as much as possible", but there are still no guarantees. If you add
>> >>>> del b; gc.collect() you will see "del" printed. Of course this
>> >>>> involves a cycle, but cycles can come in ways that you don't expect
>> >>>> them and PyPy simply says "everything is GCed". I think it's very
>> >>>> much
>> >>>> in line with what python-dev thinks.
>> >>>
>> >>>
>> >>> Which is why we have __del__ in very few object and we deploy massive
>> >>> effort
>> >>> to ensure their don't get caught in cycle and mostly succeeding at
>> >>> this.
>> >>> (Kind of the same we put a lot of effort into making sure __del__ are
>> >>> never
>> >>> really called but keep them as double safety).
>> >>>
>> >>> So in the case we care about (no cycle) Cpython would call our
>> >>> __del__,
>> >>> right?
>> >>>
>> >>> --
>> >>> Pierre-Yves David
>> >>
>> >> Yes, but I would argue you can create cycles without knowing. E.g.
>> >>
>> >> def f():
>> >>    try:
>> >>       some_stuff
>> >>    except:
>> >>       x = sys.exc_info()
>> >>
>> >> creates a cycle. There are also ways to create cycles with passing
>> >> global functions around etc.
>> >
>> > This.
>> >
>> > We have plenty of cycles in our code. We just don't notice them very
>> > often because "hg" processes are short-lived. And what's worse is we don't
>> > know we're introducing them unless we go looking for them, often after
>> > someone complains about a leak on a large repo.
>> >
>> > If you want to create cycles and leak memory, I recommend "hg convert"
>> > on thousands of revisions with extensions and hooks installed. Or start a
>> > WSGI server.
>> >
>> > One of the reasons I want to get Python 3 support is so we can use its
>> > tracemalloc module to help debug leaks. The Python 2 tools for finding
>> > cycles and leaks (such as guppy and heappy) are a bit harder to use and to
>> > integrate into our testing harness.
>>
>> I thought the consensus was that yes, there are cycles, no, they're
>> not a problem because we need to close files/resources with context
>> managers anyway. Why you would want to "debug" cycles?
>
>
> Cycles and files/resources are separate things. But both involves leaks: you
> can leak file descriptors and sockets by not using context managers or
> "finally" blocks and you can leak memory by introducing cycles.
>
> You would want to "debug" cycles to figure out why your long-running
> Mercurial process is leaking memory and OOMing. I've had to deal with
> Mercurial leaking memory in both the WSGI server and in conversion tools,
> like `hg convert` and hg-git. A well-behaved application should not leak
> memory nor resources.

Not sure where your get your data from, but a well behaved python
interpreter (which includes CPython and PyPy) should not leak memory
through cycles. To be precise: the maximum amount of memory your GC
should consume should be coefficient x your max memory with
coefficient being 1.2-1.3, so unless you're using 70% of your RAM for
your mercurial process, having a cycle won't OOM

Patch

--- a/tests/test-devel-warnings.t
+++ b/tests/test-devel-warnings.t
@@ -9,10 +9,11 @@ 
    > command = cmdutil.command(cmdtable)
    >
    > @command('buggylocking', [], '')
    > def buggylocking(ui, repo):
    >     tr = repo.transaction('buggy')
+  >     tr.release() # make sure we rollback the transaction as pypy 
skip the __del__
    >     lo = repo.lock()
    >     wl = repo.wlock()
    >     wl.release()
    >     lo.release()