Patchwork [3,of,3] revset: fast implementation for fullreposet.__and__

login
register
mail settings
Submitter Pierre-Yves David
Date Sept. 20, 2014, 12:07 a.m.
Message ID <f9806f19d2d158c228ae.1411171657@marginatus.alto.octopoid.net>
Download mbox | patch
Permalink /patch/5892/
State Superseded
Commit 911f5a6579d1f7c320220d5688b2b89c875909c1
Headers show

Comments

Pierre-Yves David - Sept. 20, 2014, 12:07 a.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@fb.com>
# Date 1398878911 25200
#      Wed Apr 30 10:28:31 2014 -0700
# Node ID f9806f19d2d158c228ae7207371afa42475ed946
# Parent  5aec2be794fbd94adc4f624c57cdd2fc9907e6e3
revset: fast implementation for fullreposet.__and__

"And" operation with something that contains the whole repo should be super
cheap. Check method docstring for details.

This provide massive boost to simple revset that use `subset & xxx`

revset #0: p1(20000)
0) wall 0.002447 comb 0.010000 user 0.010000 sys 0.000000 (best of 767)
1) wall 0.000529 comb 0.000000 user 0.000000 sys 0.000000 (best of 3947)

revset #1: p2(10000)
0) wall 0.002464 comb 0.000000 user 0.000000 sys 0.000000 (best of 913)
1) wall 0.000530 comb 0.000000 user 0.000000 sys 0.000000 (best of 4226)

No other regression spotted.

More performance are expected  in the future as more revset predicate are
converted to use `subset & xxx`
Durham Goode - Sept. 23, 2014, 12:10 a.m.
On 9/19/14, 5:07 PM, Pierre-Yves David wrote:
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@fb.com>
> # Date 1398878911 25200
> #      Wed Apr 30 10:28:31 2014 -0700
> # Node ID f9806f19d2d158c228ae7207371afa42475ed946
> # Parent  5aec2be794fbd94adc4f624c57cdd2fc9907e6e3
> revset: fast implementation for fullreposet.__and__
>
> "And" operation with something that contains the whole repo should be super
> cheap. Check method docstring for details.
>
> This provide massive boost to simple revset that use `subset & xxx`
>
> revset #0: p1(20000)
> 0) wall 0.002447 comb 0.010000 user 0.010000 sys 0.000000 (best of 767)
> 1) wall 0.000529 comb 0.000000 user 0.000000 sys 0.000000 (best of 3947)
>
> revset #1: p2(10000)
> 0) wall 0.002464 comb 0.000000 user 0.000000 sys 0.000000 (best of 913)
> 1) wall 0.000530 comb 0.000000 user 0.000000 sys 0.000000 (best of 4226)
>
> No other regression spotted.
>
> More performance are expected  in the future as more revset predicate are
> converted to use `subset & xxx`
>
> diff --git a/mercurial/revset.py b/mercurial/revset.py
> --- a/mercurial/revset.py
> +++ b/mercurial/revset.py
> @@ -2865,7 +2865,29 @@ class fullreposet(_spanset):
>       """
>   
>       def __init__(self, repo):
>           super(fullreposet, self).__init__(repo)
>   
> +    def __and__(self, other):
> +        """fullrepo & other -> other
> +
> +        As self contains the whole repo, all of the other set should also be in
> +        self. Therefor `self & other = other`.
> +
> +        This boldly assume the other contains valid revs only.
>
I feel like other revsets may depend on the spanset to do the hidden 
commit filtering, and this would remove that.  Have you run the 
evolution tests with this enabled?
Pierre-Yves David - Sept. 23, 2014, 12:18 a.m.
On 09/22/2014 05:10 PM, Durham Goode wrote:
>
> On 9/19/14, 5:07 PM, Pierre-Yves David wrote:
>> # HG changeset patch
>> # User Pierre-Yves David <pierre-yves.david@fb.com>
>> # Date 1398878911 25200
>> #      Wed Apr 30 10:28:31 2014 -0700
>> # Node ID f9806f19d2d158c228ae7207371afa42475ed946
>> # Parent  5aec2be794fbd94adc4f624c57cdd2fc9907e6e3
>> revset: fast implementation for fullreposet.__and__
>>
>> "And" operation with something that contains the whole repo should be
>> super
>> cheap. Check method docstring for details.
>>
>> This provide massive boost to simple revset that use `subset & xxx`
>>
>> revset #0: p1(20000)
>> 0) wall 0.002447 comb 0.010000 user 0.010000 sys 0.000000 (best of 767)
>> 1) wall 0.000529 comb 0.000000 user 0.000000 sys 0.000000 (best of 3947)
>>
>> revset #1: p2(10000)
>> 0) wall 0.002464 comb 0.000000 user 0.000000 sys 0.000000 (best of 913)
>> 1) wall 0.000530 comb 0.000000 user 0.000000 sys 0.000000 (best of 4226)
>>
>> No other regression spotted.
>>
>> More performance are expected  in the future as more revset predicate are
>> converted to use `subset & xxx`
>>
>> diff --git a/mercurial/revset.py b/mercurial/revset.py
>> --- a/mercurial/revset.py
>> +++ b/mercurial/revset.py
>> @@ -2865,7 +2865,29 @@ class fullreposet(_spanset):
>>       """
>>       def __init__(self, repo):
>>           super(fullreposet, self).__init__(repo)
>> +    def __and__(self, other):
>> +        """fullrepo & other -> other
>> +
>> +        As self contains the whole repo, all of the other set should
>> also be in
>> +        self. Therefor `self & other = other`.
>> +
>> +        This boldly assume the other contains valid revs only.
>>
> I feel like other revsets may depend on the spanset to do the hidden
> commit filtering, and this would remove that.

If such exists, I would says that the bug is in the revsets. Are you 
aware of any?

>  Have you run the evolution tests with this enabled?

Not yet, doing it now, (I have to fix a couple of unrelated changes in 
evolve test first)
Durham Goode - Sept. 23, 2014, 12:47 a.m.
On 9/22/14, 5:18 PM, Pierre-Yves David wrote:
>
>
> On 09/22/2014 05:10 PM, Durham Goode wrote:
>>
>> On 9/19/14, 5:07 PM, Pierre-Yves David wrote:
>>> +        This boldly assume the other contains valid revs only.
>>>
>> I feel like other revsets may depend on the spanset to do the hidden
>> commit filtering, and this would remove that.
>
> If such exists, I would says that the bug is in the revsets. Are you 
> aware of any?
>
>>  Have you run the evolution tests with this enabled?
>
> Not yet, doing it now, (I have to fix a couple of unrelated changes in 
> evolve test first)
>
_intlist seems to assume the given revs are fine.  I wouldn't be 
surprised if just passing a rev number also just created a set with the 
rev in it without checking if it's filtered first.

filelog() might have the same issue, since it returns linkrevs 
directly.  Though it uses filter instead of &, so it's probably ok for now.

It might be worth having an exception or debug assert or something for a 
bit to flush out issues?
Pierre-Yves David - Sept. 23, 2014, 12:50 a.m.
On 09/22/2014 05:10 PM, Durham Goode wrote:
>
> On 9/19/14, 5:07 PM, Pierre-Yves David wrote:
>> # HG changeset patch
>> # User Pierre-Yves David <pierre-yves.david@fb.com>
>> # Date 1398878911 25200
>> #      Wed Apr 30 10:28:31 2014 -0700
>> # Node ID f9806f19d2d158c228ae7207371afa42475ed946
>> # Parent  5aec2be794fbd94adc4f624c57cdd2fc9907e6e3
>> revset: fast implementation for fullreposet.__and__
>>
>> "And" operation with something that contains the whole repo should be
>> super
>> cheap. Check method docstring for details.
>>
>> This provide massive boost to simple revset that use `subset & xxx`
>>
>> revset #0: p1(20000)
>> 0) wall 0.002447 comb 0.010000 user 0.010000 sys 0.000000 (best of 767)
>> 1) wall 0.000529 comb 0.000000 user 0.000000 sys 0.000000 (best of 3947)
>>
>> revset #1: p2(10000)
>> 0) wall 0.002464 comb 0.000000 user 0.000000 sys 0.000000 (best of 913)
>> 1) wall 0.000530 comb 0.000000 user 0.000000 sys 0.000000 (best of 4226)
>>
>> No other regression spotted.
>>
>> More performance are expected  in the future as more revset predicate are
>> converted to use `subset & xxx`
>>
>> diff --git a/mercurial/revset.py b/mercurial/revset.py
>> --- a/mercurial/revset.py
>> +++ b/mercurial/revset.py
>> @@ -2865,7 +2865,29 @@ class fullreposet(_spanset):
>>       """
>>       def __init__(self, repo):
>>           super(fullreposet, self).__init__(repo)
>> +    def __and__(self, other):
>> +        """fullrepo & other -> other
>> +
>> +        As self contains the whole repo, all of the other set should
>> also be in
>> +        self. Therefor `self & other = other`.
>> +
>> +        This boldly assume the other contains valid revs only.
>>
> I feel like other revsets may depend on the spanset to do the hidden
> commit filtering, and this would remove that.  Have you run the
> evolution tests with this enabled?

So the obsolete() (and related revset are subject to this error) It 
won't be too hard to fix for then.

However we can also think of a cleaner way to handle this. Smartset 
class. with content known to be pure could have a "smartset.pure == 
True" that trigger the optimisation.

I'm not so concerned about having this issue in the revset code itself. 
I'm more concerned about having people using the return of `repo.revs()` 
and receiving a fullreposet. So another option would be to disable this 
optimization before returning it to the "user"
Pierre-Yves David - Sept. 23, 2014, 1:40 a.m.
On 09/22/2014 05:47 PM, Durham Goode wrote:
>
> On 9/22/14, 5:18 PM, Pierre-Yves David wrote:
>>
>>
>> On 09/22/2014 05:10 PM, Durham Goode wrote:
>>>
>>> On 9/19/14, 5:07 PM, Pierre-Yves David wrote:
>>>> +        This boldly assume the other contains valid revs only.
>>>>
>>> I feel like other revsets may depend on the spanset to do the hidden
>>> commit filtering, and this would remove that.
>>
>> If such exists, I would says that the bug is in the revsets. Are you
>> aware of any?
>>
>>>  Have you run the evolution tests with this enabled?
>>
>> Not yet, doing it now, (I have to fix a couple of unrelated changes in
>> evolve test first)
>>
> _intlist seems to assume the given revs are fine.  I wouldn't be
> surprised if just passing a rev number also just created a set with the
> rev in it without checking if it's filtered first.
>
> filelog() might have the same issue, since it returns linkrevs
> directly.  Though it uses filter instead of &, so it's probably ok for now.
>
> It might be worth having an exception or debug assert or something for a
> bit to flush out issues?

After IRL discussion with Durham we resolved to add a check in the 
__and__ method to check if any filtered revision exists in the other 
set. (And filter them out if they do)

This should remove most of the core bug related to this change.

However, this won't help for value below 0 and above unfiltered tip. so 
we'll have to so something about it at some point.

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -2865,7 +2865,29 @@  class fullreposet(_spanset):
     """
 
     def __init__(self, repo):
         super(fullreposet, self).__init__(repo)
 
+    def __and__(self, other):
+        """fullrepo & other -> other
+
+        As self contains the whole repo, all of the other set should also be in
+        self. Therefor `self & other = other`.
+
+        This boldly assume the other contains valid revs only.
+        """
+        # dif not a smartset, make is so
+        if not util.safehasattr(other, 'set'):
+            other = baseset(other)
+
+        # preserve order:
+        #
+        # this is probably useless and harmful in multiple case but this match
+        # the current behavior.
+        if self.isascending():
+            other.ascending()
+        else:
+            other.descending()
+        return other
+
 # tell hggettext to extract docstrings from these functions:
 i18nfunctions = symbols.values()