Patchwork [V2] revset: fix generatorset race condition

login
register
mail settings
Submitter Durham Goode
Date March 26, 2014, 12:33 a.m.
Message ID <c28ab989810ae838379b.1395793999@dev2000.prn2.facebook.com>
Download mbox | patch
Permalink /patch/4070/
State Accepted
Commit 47d43e2323c5f8c6492d8ce01117424c716ceaea
Headers show

Comments

Durham Goode - March 26, 2014, 12:33 a.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1395789007 25200
#      Tue Mar 25 16:10:07 2014 -0700
# Node ID c28ab989810ae838379b8964dbdd53a300a53c92
# Parent  9a09a625bc93ab69248613abcbea041b785f1a0e
revset: fix generatorset race condition

If two things were iterating over a generatorset at the same time, they could
miss out on the things the other was generating, resulting in incomplete
results. This fixes it by making it possible for two things to iterate at once,
by always checking the _genlist at the beginning of each iteration.

I was only able to repro it with pending changes from my other commits, but they
aren't ready yet. So I'm unable to add a test for now.
Gregory Szorc - March 26, 2014, 12:54 a.m.
On 3/25/14, 5:33 PM, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1395789007 25200
> #      Tue Mar 25 16:10:07 2014 -0700
> # Node ID c28ab989810ae838379b8964dbdd53a300a53c92
> # Parent  9a09a625bc93ab69248613abcbea041b785f1a0e
> revset: fix generatorset race condition
>
> If two things were iterating over a generatorset at the same time, they could
> miss out on the things the other was generating, resulting in incomplete
> results. This fixes it by making it possible for two things to iterate at once,
> by always checking the _genlist at the beginning of each iteration.
>
> I was only able to repro it with pending changes from my other commits, but they
> aren't ready yet. So I'm unable to add a test for now.
>
> diff --git a/mercurial/revset.py b/mercurial/revset.py
> --- a/mercurial/revset.py
> +++ b/mercurial/revset.py
> @@ -2624,10 +2624,8 @@
>           gen: a generator producing the values for the generatorset.
>           """
>           self._gen = gen
> -        self._iter = iter(gen)
>           self._cache = {}
>           self._genlist = baseset([])
> -        self._iterated = False
>           self._finished = False
>
>       def __contains__(self, x):
> @@ -2639,28 +2637,30 @@
>               if l == x:
>                   return True
>
> -        self._finished = True
>           self._cache[x] = False
>           return False
>
>       def __iter__(self):
> -        if self._iterated:
> -            # At least a part of the list should be cached if iteration has
> -            # started over the generatorset.
> -            for l in self._genlist:
> -                yield l
> -
> -        for item in self._consumegen():
> -            yield item
> +        if self._finished:
> +            for x in self._genlist:
> +                yield x
> +            return
> +
> +        i = 0
> +        genlist = self._genlist
> +        consume = self._consumegen()
> +        while True:
> +            if i < len(genlist):
> +                yield genlist[i]
> +            else:
> +                yield consume.next()
> +            i += 1
>
>       def _consumegen(self):
> -        self._iterated = True
> -
>           for item in self._gen:
>               self._cache[item] = True
>               self._genlist.append(item)
>               yield item
> -
>           self._finished = True
>
>       def set(self):

I haven't executed this, but the implementation looks good to me.

Thank you for finding and fixing my bug and for working on additional 
revset performance improvements.
David Soria Parra - March 26, 2014, 5:34 p.m.
Durham Goode <durham@fb.com> writes:

> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1395789007 25200
> #      Tue Mar 25 16:10:07 2014 -0700
> # Node ID c28ab989810ae838379b8964dbdd53a300a53c92
> # Parent  9a09a625bc93ab69248613abcbea041b785f1a0e
> revset: fix generatorset race condition
>
> If two things were iterating over a generatorset at the same time, they could
> miss out on the things the other was generating, resulting in incomplete
> results. This fixes it by making it possible for two things to iterate at once,
> by always checking the _genlist at the beginning of each iteration.
>
> I was only able to repro it with pending changes from my other commits, but they
> aren't ready yet. So I'm unable to add a test for now.
>

this looks good to me, but I'll leave it for matt or crew to queue it.
Matt Mackall - March 26, 2014, 8:03 p.m.
On Tue, 2014-03-25 at 17:33 -0700, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1395789007 25200
> #      Tue Mar 25 16:10:07 2014 -0700
> # Node ID c28ab989810ae838379b8964dbdd53a300a53c92
> # Parent  9a09a625bc93ab69248613abcbea041b785f1a0e
> revset: fix generatorset race condition

Queued for default, thanks.
Augie Fackler - March 26, 2014, 9:04 p.m.
Crewed this, thanks.

On Mar 25, 2014, at 8:33 PM, Durham Goode <durham@fb.com> wrote:

> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1395789007 25200
> #      Tue Mar 25 16:10:07 2014 -0700
> # Node ID c28ab989810ae838379b8964dbdd53a300a53c92
> # Parent  9a09a625bc93ab69248613abcbea041b785f1a0e
> revset: fix generatorset race condition
> 
> If two things were iterating over a generatorset at the same time, they could
> miss out on the things the other was generating, resulting in incomplete
> results. This fixes it by making it possible for two things to iterate at once,
> by always checking the _genlist at the beginning of each iteration.
> 
> I was only able to repro it with pending changes from my other commits, but they
> aren't ready yet. So I'm unable to add a test for now.
> 
> diff --git a/mercurial/revset.py b/mercurial/revset.py
> --- a/mercurial/revset.py
> +++ b/mercurial/revset.py
> @@ -2624,10 +2624,8 @@
>         gen: a generator producing the values for the generatorset.
>         """
>         self._gen = gen
> -        self._iter = iter(gen)
>         self._cache = {}
>         self._genlist = baseset([])
> -        self._iterated = False
>         self._finished = False
> 
>     def __contains__(self, x):
> @@ -2639,28 +2637,30 @@
>             if l == x:
>                 return True
> 
> -        self._finished = True
>         self._cache[x] = False
>         return False
> 
>     def __iter__(self):
> -        if self._iterated:
> -            # At least a part of the list should be cached if iteration has
> -            # started over the generatorset.
> -            for l in self._genlist:
> -                yield l
> -
> -        for item in self._consumegen():
> -            yield item
> +        if self._finished:
> +            for x in self._genlist:
> +                yield x
> +            return
> +
> +        i = 0
> +        genlist = self._genlist
> +        consume = self._consumegen()
> +        while True:
> +            if i < len(genlist):
> +                yield genlist[i]
> +            else:
> +                yield consume.next()
> +            i += 1
> 
>     def _consumegen(self):
> -        self._iterated = True
> -
>         for item in self._gen:
>             self._cache[item] = True
>             self._genlist.append(item)
>             yield item
> -
>         self._finished = True
> 
>     def set(self):
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
Pierre-Yves David - March 26, 2014, 9:05 p.m.
On 03/26/2014 02:04 PM, Augie Fackler wrote:
> Crewed this, thanks.

I'm rejoicing in advance to have matt meeting divergent changesets

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -2624,10 +2624,8 @@ 
         gen: a generator producing the values for the generatorset.
         """
         self._gen = gen
-        self._iter = iter(gen)
         self._cache = {}
         self._genlist = baseset([])
-        self._iterated = False
         self._finished = False
 
     def __contains__(self, x):
@@ -2639,28 +2637,30 @@ 
             if l == x:
                 return True
 
-        self._finished = True
         self._cache[x] = False
         return False
 
     def __iter__(self):
-        if self._iterated:
-            # At least a part of the list should be cached if iteration has
-            # started over the generatorset.
-            for l in self._genlist:
-                yield l
-
-        for item in self._consumegen():
-            yield item
+        if self._finished:
+            for x in self._genlist:
+                yield x
+            return
+
+        i = 0
+        genlist = self._genlist
+        consume = self._consumegen()
+        while True:
+            if i < len(genlist):
+                yield genlist[i]
+            else:
+                yield consume.next()
+            i += 1
 
     def _consumegen(self):
-        self._iterated = True
-
         for item in self._gen:
             self._cache[item] = True
             self._genlist.append(item)
             yield item
-
         self._finished = True
 
     def set(self):