Patchwork [2,of,2] revset: speed up existence checks for ordered filtered sets

login
register
mail settings
Submitter Durham Goode
Date Sept. 21, 2015, 3:51 a.m.
Message ID <46a81d56dfabfc6ed8e1.1442807496@dev2000.prn2.facebook.com>
Download mbox | patch
Permalink /patch/10554/
State Superseded
Commit d157e1f18e3f529006217e1584edeec1d9288485
Headers show

Comments

Durham Goode - Sept. 21, 2015, 3:51 a.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1442793222 25200
#      Sun Sep 20 16:53:42 2015 -0700
# Node ID 46a81d56dfabfc6ed8e1c08dae41b840574b8835
# Parent  9deecd57915da9ed217abafb81cf8cba506b2849
revset: speed up existence checks for ordered filtered sets

Previously, calling 'if foo:' on a ordered filtered set would start iterating in
whatever the current direction was and return if a value was available. If the
current direction was ascending, but the set had a fastdesc available, this
meant we did a lot more work than necessary.

If this was applied without my previous max/min fixes, it would improve max()
performance (this was my first attempt at fixing the issue). Since those
previous fixes went in though, this doesn't have a visible benefit in the
benchmarks, but it does seem clearly better than it was before so I think it
should still go in.
Yuya Nishihara - Sept. 21, 2015, 4:25 p.m.
On Sun, 20 Sep 2015 20:51:36 -0700, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1442793222 25200
> #      Sun Sep 20 16:53:42 2015 -0700
> # Node ID 46a81d56dfabfc6ed8e1c08dae41b840574b8835
> # Parent  9deecd57915da9ed217abafb81cf8cba506b2849
> revset: speed up existence checks for ordered filtered sets
> 
> Previously, calling 'if foo:' on a ordered filtered set would start iterating in
> whatever the current direction was and return if a value was available. If the
> current direction was ascending, but the set had a fastdesc available, this
> meant we did a lot more work than necessary.
> 
> If this was applied without my previous max/min fixes, it would improve max()
> performance (this was my first attempt at fixing the issue). Since those
> previous fixes went in though, this doesn't have a visible benefit in the
> benchmarks, but it does seem clearly better than it was before so I think it
> should still go in.
> 
> diff --git a/mercurial/revset.py b/mercurial/revset.py
> --- a/mercurial/revset.py
> +++ b/mercurial/revset.py
> @@ -3111,7 +3111,12 @@ class filteredset(abstractsmartset):
>          return lambda: self._iterfilter(it())
>  
>      def __nonzero__(self):
> -        for r in self:
> +        it = self
> +        fast = self.fastdesc or self.fastasc
> +        if fast:
> +            it = fast()
> +
> +        for r in it:
>              return True

I think the PATCH 1 is great, but I'm not sure about this.

If I understand it, this is fast only if the filtered revisions exist near
fastdesc side?

  condition = parents(. + .^)  # subset.fastdesc can reach the condition faster
  subset = fullreposet
Durham Goode - Sept. 21, 2015, 5:14 p.m.
On 9/21/15 9:25 AM, Yuya Nishihara wrote:
> On Sun, 20 Sep 2015 20:51:36 -0700, Durham Goode wrote:
>> # HG changeset patch
>> # User Durham Goode <durham@fb.com>
>> # Date 1442793222 25200
>> #      Sun Sep 20 16:53:42 2015 -0700
>> # Node ID 46a81d56dfabfc6ed8e1c08dae41b840574b8835
>> # Parent  9deecd57915da9ed217abafb81cf8cba506b2849
>> revset: speed up existence checks for ordered filtered sets
>>
>> Previously, calling 'if foo:' on a ordered filtered set would start iterating in
>> whatever the current direction was and return if a value was available. If the
>> current direction was ascending, but the set had a fastdesc available, this
>> meant we did a lot more work than necessary.
>>
>> If this was applied without my previous max/min fixes, it would improve max()
>> performance (this was my first attempt at fixing the issue). Since those
>> previous fixes went in though, this doesn't have a visible benefit in the
>> benchmarks, but it does seem clearly better than it was before so I think it
>> should still go in.
>>
>> diff --git a/mercurial/revset.py b/mercurial/revset.py
>> --- a/mercurial/revset.py
>> +++ b/mercurial/revset.py
>> @@ -3111,7 +3111,12 @@ class filteredset(abstractsmartset):
>>           return lambda: self._iterfilter(it())
>>   
>>       def __nonzero__(self):
>> -        for r in self:
>> +        it = self
>> +        fast = self.fastdesc or self.fastasc
>> +        if fast:
>> +            it = fast()
>> +
>> +        for r in it:
>>               return True
> I think the PATCH 1 is great, but I'm not sure about this.
>
> If I understand it, this is fast only if the filtered revisions exist near
> fastdesc side?
>
>    condition = parents(. + .^)  # subset.fastdesc can reach the condition faster
>    subset = fullreposet
We can reverse the preference order if you want ("fastasc or 
fastdesc").  That will maintain the old behavior of preferring ascending 
orders.  The point is that self.__iter__ is not known to be fast, so if 
we do have a fast iterator available we should use it.
Pierre-Yves David - Sept. 21, 2015, 6:51 p.m.
On 09/21/2015 09:25 AM, Yuya Nishihara wrote:
> On Sun, 20 Sep 2015 20:51:36 -0700, Durham Goode wrote:
>> # HG changeset patch
>> # User Durham Goode <durham@fb.com>
>> # Date 1442793222 25200
>> #      Sun Sep 20 16:53:42 2015 -0700
>> # Node ID 46a81d56dfabfc6ed8e1c08dae41b840574b8835
>> # Parent  9deecd57915da9ed217abafb81cf8cba506b2849
>> revset: speed up existence checks for ordered filtered sets
>>
>> Previously, calling 'if foo:' on a ordered filtered set would start iterating in
>> whatever the current direction was and return if a value was available. If the
>> current direction was ascending, but the set had a fastdesc available, this
>> meant we did a lot more work than necessary.
>>
>> If this was applied without my previous max/min fixes, it would improve max()
>> performance (this was my first attempt at fixing the issue). Since those
>> previous fixes went in though, this doesn't have a visible benefit in the
>> benchmarks, but it does seem clearly better than it was before so I think it
>> should still go in.
>>
>> diff --git a/mercurial/revset.py b/mercurial/revset.py
>> --- a/mercurial/revset.py
>> +++ b/mercurial/revset.py
>> @@ -3111,7 +3111,12 @@ class filteredset(abstractsmartset):
>>           return lambda: self._iterfilter(it())
>>
>>       def __nonzero__(self):
>> -        for r in self:
>> +        it = self
>> +        fast = self.fastdesc or self.fastasc
>> +        if fast:
>> +            it = fast()
>> +
>> +        for r in it:
>>               return True
>
> I think the PATCH 1 is great, but I'm not sure about this.
>
> If I understand it, this is fast only if the filtered revisions exist near
> fastdesc side?
>
>    condition = parents(. + .^)  # subset.fastdesc can reach the condition faster
>    subset = fullreposet

The fast version are supposed to be our best bet for a fast iteration 
and we use them as such in multiple place. Maybe a 'fastit' that would 
be able to pick betwen fast asc or fast desc would help, but I'm not 
sure how often we can do that.
Yuya Nishihara - Sept. 22, 2015, 4:28 a.m.
On Mon, 21 Sep 2015 10:14:21 -0700, Durham Goode wrote:
> On 9/21/15 9:25 AM, Yuya Nishihara wrote:
> > On Sun, 20 Sep 2015 20:51:36 -0700, Durham Goode wrote:
> >> # HG changeset patch
> >> # User Durham Goode <durham@fb.com>
> >> # Date 1442793222 25200
> >> #      Sun Sep 20 16:53:42 2015 -0700
> >> # Node ID 46a81d56dfabfc6ed8e1c08dae41b840574b8835
> >> # Parent  9deecd57915da9ed217abafb81cf8cba506b2849
> >> revset: speed up existence checks for ordered filtered sets
> >>
> >> Previously, calling 'if foo:' on a ordered filtered set would start iterating in
> >> whatever the current direction was and return if a value was available. If the
> >> current direction was ascending, but the set had a fastdesc available, this
> >> meant we did a lot more work than necessary.
> >>
> >> If this was applied without my previous max/min fixes, it would improve max()
> >> performance (this was my first attempt at fixing the issue). Since those
> >> previous fixes went in though, this doesn't have a visible benefit in the
> >> benchmarks, but it does seem clearly better than it was before so I think it
> >> should still go in.
> >>
> >> diff --git a/mercurial/revset.py b/mercurial/revset.py
> >> --- a/mercurial/revset.py
> >> +++ b/mercurial/revset.py
> >> @@ -3111,7 +3111,12 @@ class filteredset(abstractsmartset):
> >>           return lambda: self._iterfilter(it())
> >>   
> >>       def __nonzero__(self):
> >> -        for r in self:
> >> +        it = self
> >> +        fast = self.fastdesc or self.fastasc
> >> +        if fast:
> >> +            it = fast()
> >> +
> >> +        for r in it:
> >>               return True
> > I think the PATCH 1 is great, but I'm not sure about this.
> >
> > If I understand it, this is fast only if the filtered revisions exist near
> > fastdesc side?
> >
> >    condition = parents(. + .^)  # subset.fastdesc can reach the condition faster
> >    subset = fullreposet
>
> We can reverse the preference order if you want ("fastasc or 
> fastdesc").  That will maintain the old behavior of preferring ascending 
> orders.  The point is that self.__iter__ is not known to be fast, so if 
> we do have a fast iterator available we should use it.

I got the point that __iter__ can be slower than the fast versions, but I
don't think the iteration cost is that important here.

Revision:
0) @
1) self.fastdesc or self.fastasc (PATCH 2)
2) self.fastasc or self.fastdesc

revset #0: max(parents(. + .^) - (. + .^)  & ::@)
0) 0.038698
1) 0.000446   1%
2) 0.038745

revset #1: min(parents(10 + 10^) - (10 + 10^) & ::@)
0) 0.057440
1) 0.096196 167%
2) 0.057400

revset #2: max(author(lmoscovicz))
0) 1.364134
1) 0.317355  23%
2) 1.349448

revset #3: min(author(mason))
0) 0.033658
1) 1.390670  x41
2) 0.033108

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -3111,7 +3111,12 @@  class filteredset(abstractsmartset):
         return lambda: self._iterfilter(it())
 
     def __nonzero__(self):
-        for r in self:
+        it = self
+        fast = self.fastdesc or self.fastasc
+        if fast:
+            it = fast()
+
+        for r in it:
             return True
         return False