Patchwork revset: fix infinite recursion if called with __len__ first

login
register
mail settings
Submitter Maciej Fijalkowski
Date March 31, 2016, 2:44 p.m.
Message ID <eaaeff1b98571b90ec71.1459435478@brick.arcode.com>
Download mbox | patch
Permalink /patch/14205/
State Changes Requested
Delegated to: Pierre-Yves David
Headers show

Comments

Maciej Fijalkowski - March 31, 2016, 2:44 p.m.
# HG changeset patch
# User Maciej Fijalkowski <fijall@gmail.com>
# Date 1459435465 -7200
#      Thu Mar 31 16:44:25 2016 +0200
# Node ID eaaeff1b98571b90ec71614c75126e64e98c004b
# Parent  ff0d3b6b287f89594bd8d0308fe2810d2a18ea01
revset: fix infinite recursion if called with __len__ first

The problem here is simple - set(x) can (but doesn't have to) call
__len__ first. If we do that, __len__ calls set which calls __len__
again and we end up with inifinite recursion that gets swallowed and
gets interesting results. Call it properly on the subset instead,
fixes test-bisect2.t on PyPy
Pierre-Yves David - April 1, 2016, 7 a.m.
On 03/31/2016 07:44 AM, Maciej Fijalkowski wrote:
> # HG changeset patch
> # User Maciej Fijalkowski <fijall@gmail.com>
> # Date 1459435465 -7200
> #      Thu Mar 31 16:44:25 2016 +0200
> # Node ID eaaeff1b98571b90ec71614c75126e64e98c004b
> # Parent  ff0d3b6b287f89594bd8d0308fe2810d2a18ea01
> revset: fix infinite recursion if called with __len__ first
>
> The problem here is simple - set(x) can (but doesn't have to) call
> __len__ first. If we do that, __len__ calls set which calls __len__
> again and we end up with inifinite recursion that gets swallowed and
> gets interesting results. Call it properly on the subset instead,
> fixes test-bisect2.t on PyPy
>
> diff -r ff0d3b6b287f -r eaaeff1b9857 mercurial/revset.py
> --- a/mercurial/revset.py	Tue Mar 29 12:29:00 2016 -0500
> +++ b/mercurial/revset.py	Thu Mar 31 16:44:25 2016 +0200
> @@ -3020,7 +3020,7 @@
>
>       def __len__(self):
>           # Basic implementation to be changed in future patches.
> -        l = baseset([r for r in self])
> +        l = baseset([r for r in self._subset if self._condition(r)])

yikes, duplicating the logic for an obscure reason seems suboptimal. 
Moreover, without a comment, the next guy wil factorise this :-/


What about

   baseset([r for r in iter(self)])

or maybe even "better".

   l = 0
   for r in self:
      l += 1
   return l
Maciej Fijalkowski - April 1, 2016, 7:03 a.m.
On Fri, Apr 1, 2016 at 9:00 AM, Pierre-Yves David
<pierre-yves.david@ens-lyon.org> wrote:
>
>
> On 03/31/2016 07:44 AM, Maciej Fijalkowski wrote:
>>
>> # HG changeset patch
>> # User Maciej Fijalkowski <fijall@gmail.com>
>> # Date 1459435465 -7200
>> #      Thu Mar 31 16:44:25 2016 +0200
>> # Node ID eaaeff1b98571b90ec71614c75126e64e98c004b
>> # Parent  ff0d3b6b287f89594bd8d0308fe2810d2a18ea01
>> revset: fix infinite recursion if called with __len__ first
>>
>> The problem here is simple - set(x) can (but doesn't have to) call
>> __len__ first. If we do that, __len__ calls set which calls __len__
>> again and we end up with inifinite recursion that gets swallowed and
>> gets interesting results. Call it properly on the subset instead,
>> fixes test-bisect2.t on PyPy
>>
>> diff -r ff0d3b6b287f -r eaaeff1b9857 mercurial/revset.py
>> --- a/mercurial/revset.py       Tue Mar 29 12:29:00 2016 -0500
>> +++ b/mercurial/revset.py       Thu Mar 31 16:44:25 2016 +0200
>> @@ -3020,7 +3020,7 @@
>>
>>       def __len__(self):
>>           # Basic implementation to be changed in future patches.
>> -        l = baseset([r for r in self])
>> +        l = baseset([r for r in self._subset if self._condition(r)])
>
>
> yikes, duplicating the logic for an obscure reason seems suboptimal.
> Moreover, without a comment, the next guy wil factorise this :-/
>
>
> What about
>
>   baseset([r for r in iter(self)])
>
> or maybe even "better".
>
>   l = 0
>   for r in self:
>      l += 1
>   return l
>
> --
> Pierre-Yves David

In this particular situation baseset(r for r in self) would do
(generator expression won't call __len__) but it's very brittle one
way or another. On the other hand it's brittle by design - you *can't*
call __iter__ from __len__ because its not a given that someone won't
call __len__ in the first place
Pierre-Yves David - April 1, 2016, 7:07 a.m.
On 04/01/2016 12:03 AM, Maciej Fijalkowski wrote:
> On Fri, Apr 1, 2016 at 9:00 AM, Pierre-Yves David
> <pierre-yves.david@ens-lyon.org> wrote:
>>
>>
>> On 03/31/2016 07:44 AM, Maciej Fijalkowski wrote:
>>>
>>> # HG changeset patch
>>> # User Maciej Fijalkowski <fijall@gmail.com>
>>> # Date 1459435465 -7200
>>> #      Thu Mar 31 16:44:25 2016 +0200
>>> # Node ID eaaeff1b98571b90ec71614c75126e64e98c004b
>>> # Parent  ff0d3b6b287f89594bd8d0308fe2810d2a18ea01
>>> revset: fix infinite recursion if called with __len__ first
>>>
>>> The problem here is simple - set(x) can (but doesn't have to) call
>>> __len__ first. If we do that, __len__ calls set which calls __len__
>>> again and we end up with inifinite recursion that gets swallowed and
>>> gets interesting results. Call it properly on the subset instead,
>>> fixes test-bisect2.t on PyPy
>>>
>>> diff -r ff0d3b6b287f -r eaaeff1b9857 mercurial/revset.py
>>> --- a/mercurial/revset.py       Tue Mar 29 12:29:00 2016 -0500
>>> +++ b/mercurial/revset.py       Thu Mar 31 16:44:25 2016 +0200
>>> @@ -3020,7 +3020,7 @@
>>>
>>>        def __len__(self):
>>>            # Basic implementation to be changed in future patches.
>>> -        l = baseset([r for r in self])
>>> +        l = baseset([r for r in self._subset if self._condition(r)])
>>
>>
>> yikes, duplicating the logic for an obscure reason seems suboptimal.
>> Moreover, without a comment, the next guy wil factorise this :-/
>>
>>
>> What about
>>
>>    baseset([r for r in iter(self)])
>>
>> or maybe even "better".
>>
>>    l = 0
>>    for r in self:
>>       l += 1
>>    return l
>>
>> --
>> Pierre-Yves David
>
> In this particular situation baseset(r for r in self) would do
> (generator expression won't call __len__) but it's very brittle one
> way or another. On the other hand it's brittle by design - you *can't*
> call __iter__ from __len__ because its not a given that someone won't
> call __len__ in the first place

urg, lets go for the generator expression (with a small comment 
explaining why).

Patch

diff -r ff0d3b6b287f -r eaaeff1b9857 mercurial/revset.py
--- a/mercurial/revset.py	Tue Mar 29 12:29:00 2016 -0500
+++ b/mercurial/revset.py	Thu Mar 31 16:44:25 2016 +0200
@@ -3020,7 +3020,7 @@ 
 
     def __len__(self):
         # Basic implementation to be changed in future patches.
-        l = baseset([r for r in self])
+        l = baseset([r for r in self._subset if self._condition(r)])
         return len(l)
 
     def sort(self, reverse=False):