Patchwork revset: fix generatorset race condition

login
register
mail settings
Submitter Durham Goode
Date March 26, 2014, 12:13 a.m.
Message ID <719d5d962b0c8630c153.1395792813@dev2000.prn2.facebook.com>
Download mbox | patch
Permalink /patch/4069/
State Superseded
Commit 47d43e2323c5f8c6492d8ce01117424c716ceaea
Headers show

Comments

Durham Goode - March 26, 2014, 12:13 a.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1395789007 25200
#      Tue Mar 25 16:10:07 2014 -0700
# Node ID 719d5d962b0c8630c153453a48bfaeccbe9de2f7
# Parent  9a09a625bc93ab69248613abcbea041b785f1a0e
revset: fix generatorset race condition

If two things were iterating over a generatorset at the same time, they could
miss out on the things the other was generating, resulting in incomplete
results. This fixes it by making it possible for two things to iterate at once,
by always checking the _genlist at the beginning of each iteration.

I was only able to repro it with pending changes from my other commits, but they
aren't ready yet. So I'm unable to add a test for now.
Gregory Szorc - March 26, 2014, 12:24 a.m.
On 3/25/14, 5:13 PM, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1395789007 25200
> #      Tue Mar 25 16:10:07 2014 -0700
> # Node ID 719d5d962b0c8630c153453a48bfaeccbe9de2f7
> # Parent  9a09a625bc93ab69248613abcbea041b785f1a0e
> revset: fix generatorset race condition
>
> If two things were iterating over a generatorset at the same time, they could
> miss out on the things the other was generating, resulting in incomplete
> results. This fixes it by making it possible for two things to iterate at once,
> by always checking the _genlist at the beginning of each iteration.
>
> I was only able to repro it with pending changes from my other commits, but they
> aren't ready yet. So I'm unable to add a test for now.
>
> diff --git a/mercurial/revset.py b/mercurial/revset.py
> --- a/mercurial/revset.py
> +++ b/mercurial/revset.py
> @@ -2624,10 +2624,8 @@
>           gen: a generator producing the values for the generatorset.
>           """
>           self._gen = gen
> -        self._iter = iter(gen)
>           self._cache = {}
>           self._genlist = baseset([])
> -        self._iterated = False
>           self._finished = False
>
>       def __contains__(self, x):
> @@ -2639,28 +2637,29 @@
>               if l == x:
>                   return True
>
> -        self._finished = True
>           self._cache[x] = False
>           return False
>
>       def __iter__(self):
> -        if self._iterated:
> -            # At least a part of the list should be cached if iteration has
> -            # started over the generatorset.
> -            for l in self._genlist:
> -                yield l
> -
> -        for item in self._consumegen():
> -            yield item
> +        if self._finished:
> +            for x in self._genlist:
> +                yield x

Shouldn't we return here? Otherwise, don't we iterate over self._genlist 
twice if self._finished == True?

Why do we even need this block? Won't the while loop below handle it?

> +
> +        i = 0
> +        genlist = self._genlist
> +        consume = self._consumegen()
> +        while True:
> +            if i < len(genlist):
> +                yield genlist[i]
> +            else:
> +                yield consume.next()
> +            i += 1
>
>       def _consumegen(self):
> -        self._iterated = True
> -
>           for item in self._gen:
>               self._cache[item] = True
>               self._genlist.append(item)
>               yield item
> -
>           self._finished = True
>
>       def set(self):
>
Durham Goode - March 26, 2014, 12:27 a.m.
On 3/25/14 5:24 PM, "Gregory Szorc" <gregory.szorc@gmail.com> wrote:

>On 3/25/14, 5:13 PM, Durham Goode wrote:
>> # HG changeset patch
>> # User Durham Goode <durham@fb.com>
>> # Date 1395789007 25200
>> #      Tue Mar 25 16:10:07 2014 -0700
>> # Node ID 719d5d962b0c8630c153453a48bfaeccbe9de2f7
>> # Parent  9a09a625bc93ab69248613abcbea041b785f1a0e
>> revset: fix generatorset race condition
>>
>> If two things were iterating over a generatorset at the same time, they
>>could
>> miss out on the things the other was generating, resulting in incomplete
>> results. This fixes it by making it possible for two things to iterate
>>at once,
>> by always checking the _genlist at the beginning of each iteration.
>>
>> I was only able to repro it with pending changes from my other commits,
>>but they
>> aren't ready yet. So I'm unable to add a test for now.
>>
>> diff --git a/mercurial/revset.py b/mercurial/revset.py
>> --- a/mercurial/revset.py
>> +++ b/mercurial/revset.py
>> @@ -2624,10 +2624,8 @@
>>           gen: a generator producing the values for the generatorset.
>>           """
>>           self._gen = gen
>> -        self._iter = iter(gen)
>>           self._cache = {}
>>           self._genlist = baseset([])
>> -        self._iterated = False
>>           self._finished = False
>>
>>       def __contains__(self, x):
>> @@ -2639,28 +2637,29 @@
>>               if l == x:
>>                   return True
>>
>> -        self._finished = True
>>           self._cache[x] = False
>>           return False
>>
>>       def __iter__(self):
>> -        if self._iterated:
>> -            # At least a part of the list should be cached if
>>iteration has
>> -            # started over the generatorset.
>> -            for l in self._genlist:
>> -                yield l
>> -
>> -        for item in self._consumegen():
>> -            yield item
>> +        if self._finished:
>> +            for x in self._genlist:
>> +                yield x
>
>Shouldn't we return here? Otherwise, don't we iterate over self._genlist
>twice if self._finished == True?
>
>Why do we even need this block? Won't the while loop below handle it?

You are correct, we need a return.  I had this block because doing all
that extra stuff below might have real cost when run across hundreds of
thousands of revs.

>
>> +
>> +        i = 0
>> +        genlist = self._genlist
>> +        consume = self._consumegen()
>> +        while True:
>> +            if i < len(genlist):
>> +                yield genlist[i]
>> +            else:
>> +                yield consume.next()
>> +            i += 1
>>
>>       def _consumegen(self):
>> -        self._iterated = True
>> -
>>           for item in self._gen:
>>               self._cache[item] = True
>>               self._genlist.append(item)
>>               yield item
>> -
>>           self._finished = True
>>
>>       def set(self):
>>
>
>

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -2624,10 +2624,8 @@ 
         gen: a generator producing the values for the generatorset.
         """
         self._gen = gen
-        self._iter = iter(gen)
         self._cache = {}
         self._genlist = baseset([])
-        self._iterated = False
         self._finished = False
 
     def __contains__(self, x):
@@ -2639,28 +2637,29 @@ 
             if l == x:
                 return True
 
-        self._finished = True
         self._cache[x] = False
         return False
 
     def __iter__(self):
-        if self._iterated:
-            # At least a part of the list should be cached if iteration has
-            # started over the generatorset.
-            for l in self._genlist:
-                yield l
-
-        for item in self._consumegen():
-            yield item
+        if self._finished:
+            for x in self._genlist:
+                yield x
+
+        i = 0
+        genlist = self._genlist
+        consume = self._consumegen()
+        while True:
+            if i < len(genlist):
+                yield genlist[i]
+            else:
+                yield consume.next()
+            i += 1
 
     def _consumegen(self):
-        self._iterated = True
-
         for item in self._gen:
             self._cache[item] = True
             self._genlist.append(item)
             yield item
-
         self._finished = True
 
     def set(self):