Patchwork [1,of,2] revset: remove existence check from min() and max()

login
register
mail settings
Submitter Durham Goode
Date Sept. 21, 2015, 3:51 a.m.
Message ID <9deecd57915da9ed217a.1442807495@dev2000.prn2.facebook.com>
Download mbox | patch
Permalink /patch/10553/
State Superseded
Commit ade5c488d622c6113de501d06fff26ea528059d9
Headers show

Comments

Durham Goode - Sept. 21, 2015, 3:51 a.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1442802473 25200
#      Sun Sep 20 19:27:53 2015 -0700
# Node ID 9deecd57915da9ed217abafb81cf8cba506b2849
# Parent  a0eff7ebc2aebb32cf4c8da276104102ff37d786
revset: remove existence check from min() and max()

min() and max() would first do an existence check. Unfortunately existence
checks can be slow in certain situations (like if the smartset is a list, and
quickly iterable in both ascending and descending directions, then doing an
existence check will start from the bottom, even if you want to check the
max()).

The fix is to not do the check, and just handle the error if it happens. In a
large repo, this speeds up:

hg log -r 'max(parents(. + .^) - (. + .^)  & ::master)'

from 3.5s to 0.85s. That revset is contrived and just for testing. In our
real case we used 'bundle()' in place of '. + .^'

Interesting perf numbers for the revset benchmarks:

max(draft() and ::tip) =>  0.027s to 0.0005s
max(author(lmoscovicz)) => 2.48s to 0.57s

min doesn't show any perf changes, but changing it as well will prevent a perf
regression in my next patch.
Pierre-Yves David - Sept. 21, 2015, 6:49 p.m.
On 09/20/2015 08:51 PM, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1442802473 25200
> #      Sun Sep 20 19:27:53 2015 -0700
> # Node ID 9deecd57915da9ed217abafb81cf8cba506b2849
> # Parent  a0eff7ebc2aebb32cf4c8da276104102ff37d786
> revset: remove existence check from min() and max()
>
> min() and max() would first do an existence check. Unfortunately existence
> checks can be slow in certain situations (like if the smartset is a list, and
> quickly iterable in both ascending and descending directions, then doing an
> existence check will start from the bottom, even if you want to check the
> max()).
>
> The fix is to not do the check, and just handle the error if it happens. In a
> large repo, this speeds up:
>
> hg log -r 'max(parents(. + .^) - (. + .^)  & ::master)'
>
> from 3.5s to 0.85s. That revset is contrived and just for testing. In our
> real case we used 'bundle()' in place of '. + .^'
>
> Interesting perf numbers for the revset benchmarks:
>
> max(draft() and ::tip) =>  0.027s to 0.0005s
> max(author(lmoscovicz)) => 2.48s to 0.57s
>
> min doesn't show any perf changes, but changing it as well will prevent a perf
> regression in my next patch.
>
> diff --git a/mercurial/revset.py b/mercurial/revset.py
> --- a/mercurial/revset.py
> +++ b/mercurial/revset.py
> @@ -1277,10 +1277,13 @@ def maxrev(repo, subset, x):
>       Changeset with highest revision number in set.
>       """
>       os = getset(repo, fullreposet(repo), x)
> -    if os:
> +    try:
>           m = os.max()
>           if m in subset:
>               return baseset([m])
> +    except ValueError:
> +        # os was an empty set
> +        pass
>       return baseset()
>
>   def merge(repo, subset, x):
> @@ -1316,10 +1319,13 @@ def minrev(repo, subset, x):
>       Changeset with lowest revision number in set.
>       """
>       os = getset(repo, fullreposet(repo), x)
> -    if os:
> +    try:
>           m = os.min()
>           if m in subset:
>               return baseset([m])
> +    except ValueError:

The Value error catching scares me a bit as too wide. Could we find a 
way to narrow that?
Durham Goode - Sept. 21, 2015, 8:27 p.m.
On 9/21/15 11:49 AM, Pierre-Yves David wrote:
>
>
> On 09/20/2015 08:51 PM, Durham Goode wrote:
>> # HG changeset patch
>> # User Durham Goode <durham@fb.com>
>> # Date 1442802473 25200
>> #      Sun Sep 20 19:27:53 2015 -0700
>> # Node ID 9deecd57915da9ed217abafb81cf8cba506b2849
>> # Parent  a0eff7ebc2aebb32cf4c8da276104102ff37d786
>> revset: remove existence check from min() and max()
>>
>> min() and max() would first do an existence check. Unfortunately 
>> existence
>> checks can be slow in certain situations (like if the smartset is a 
>> list, and
>> quickly iterable in both ascending and descending directions, then 
>> doing an
>> existence check will start from the bottom, even if you want to check 
>> the
>> max()).
>>
>> The fix is to not do the check, and just handle the error if it 
>> happens. In a
>> large repo, this speeds up:
>>
>> hg log -r 'max(parents(. + .^) - (. + .^)  & ::master)'
>>
>> from 3.5s to 0.85s. That revset is contrived and just for testing. In 
>> our
>> real case we used 'bundle()' in place of '. + .^'
>>
>> Interesting perf numbers for the revset benchmarks:
>>
>> max(draft() and ::tip) =>  0.027s to 0.0005s
>> max(author(lmoscovicz)) => 2.48s to 0.57s
>>
>> min doesn't show any perf changes, but changing it as well will 
>> prevent a perf
>> regression in my next patch.
>>
>> diff --git a/mercurial/revset.py b/mercurial/revset.py
>> --- a/mercurial/revset.py
>> +++ b/mercurial/revset.py
>> @@ -1277,10 +1277,13 @@ def maxrev(repo, subset, x):
>>       Changeset with highest revision number in set.
>>       """
>>       os = getset(repo, fullreposet(repo), x)
>> -    if os:
>> +    try:
>>           m = os.max()
>>           if m in subset:
>>               return baseset([m])
>> +    except ValueError:
>> +        # os was an empty set
>> +        pass
>>       return baseset()
>>
>>   def merge(repo, subset, x):
>> @@ -1316,10 +1319,13 @@ def minrev(repo, subset, x):
>>       Changeset with lowest revision number in set.
>>       """
>>       os = getset(repo, fullreposet(repo), x)
>> -    if os:
>> +    try:
>>           m = os.min()
>>           if m in subset:
>>               return baseset([m])
>> +    except ValueError:
>
> The Value error catching scares me a bit as too wide. Could we find a 
> way to narrow that?
The python max() throws ValueError "arg is an empty sequence".  I can 
check for that string maybe, but would that change based on localization?
Matt Harbison - Sept. 23, 2015, 12:20 a.m.
On Mon, 21 Sep 2015 16:27:24 -0400, Durham Goode <durham@fb.com> wrote:

>
>
> On 9/21/15 11:49 AM, Pierre-Yves David wrote:
>>
>>
>> On 09/20/2015 08:51 PM, Durham Goode wrote:
>>> # HG changeset patch
>>> # User Durham Goode <durham@fb.com>
>>> # Date 1442802473 25200
>>> #      Sun Sep 20 19:27:53 2015 -0700
>>> # Node ID 9deecd57915da9ed217abafb81cf8cba506b2849
>>> # Parent  a0eff7ebc2aebb32cf4c8da276104102ff37d786
>>> revset: remove existence check from min() and max()
>>>
>>> min() and max() would first do an existence check. Unfortunately  
>>> existence
>>> checks can be slow in certain situations (like if the smartset is a  
>>> list, and
>>> quickly iterable in both ascending and descending directions, then  
>>> doing an
>>> existence check will start from the bottom, even if you want to check  
>>> the
>>> max()).
>>>
>>> The fix is to not do the check, and just handle the error if it  
>>> happens. In a
>>> large repo, this speeds up:
>>>
>>> hg log -r 'max(parents(. + .^) - (. + .^)  & ::master)'
>>>
>>> from 3.5s to 0.85s. That revset is contrived and just for testing. In  
>>> our
>>> real case we used 'bundle()' in place of '. + .^'
>>>
>>> Interesting perf numbers for the revset benchmarks:
>>>
>>> max(draft() and ::tip) =>  0.027s to 0.0005s
>>> max(author(lmoscovicz)) => 2.48s to 0.57s
>>>
>>> min doesn't show any perf changes, but changing it as well will  
>>> prevent a perf
>>> regression in my next patch.
>>>
>>> diff --git a/mercurial/revset.py b/mercurial/revset.py
>>> --- a/mercurial/revset.py
>>> +++ b/mercurial/revset.py
>>> @@ -1277,10 +1277,13 @@ def maxrev(repo, subset, x):
>>>       Changeset with highest revision number in set.
>>>       """
>>>       os = getset(repo, fullreposet(repo), x)
>>> -    if os:
>>> +    try:
>>>           m = os.max()
>>>           if m in subset:
>>>               return baseset([m])
>>> +    except ValueError:
>>> +        # os was an empty set
>>> +        pass
>>>       return baseset()
>>>
>>>   def merge(repo, subset, x):
>>> @@ -1316,10 +1319,13 @@ def minrev(repo, subset, x):
>>>       Changeset with lowest revision number in set.
>>>       """
>>>       os = getset(repo, fullreposet(repo), x)
>>> -    if os:
>>> +    try:
>>>           m = os.min()
>>>           if m in subset:
>>>               return baseset([m])
>>> +    except ValueError:
>>
>> The Value error catching scares me a bit as too wide. Could we find a  
>> way to narrow that?
> The python max() throws ValueError "arg is an empty sequence".  I can  
> check for that string maybe, but would that change based on localization?

Could you check for empty set in the except clause, and re-raise if  
appropriate?
Pierre-Yves David - Sept. 23, 2015, 8:25 a.m.
On 09/22/2015 05:20 PM, Matt Harbison wrote:
> On Mon, 21 Sep 2015 16:27:24 -0400, Durham Goode <durham@fb.com> wrote:
>
>> On 9/21/15 11:49 AM, Pierre-Yves David wrote:
>>>
>>> On 09/20/2015 08:51 PM, Durham Goode wrote:
>>>> @@ -1316,10 +1319,13 @@ def minrev(repo, subset, x):
>>>>       Changeset with lowest revision number in set.
>>>>       """
>>>>       os = getset(repo, fullreposet(repo), x)
>>>> -    if os:
>>>> +    try:
>>>>           m = os.min()
>>>>           if m in subset:
>>>>               return baseset([m])
>>>> +    except ValueError:
>>>
>>> The Value error catching scares me a bit as too wide. Could we find a
>>> way to narrow that?
>> The python max() throws ValueError "arg is an empty sequence".  I can
>> check for that string maybe, but would that change based on localization?
>
> Could you check for empty set in the except clause, and re-raise if
> appropriate?

That's a good idea, you should have sugguested that on the V2 so I do 
not miss it.

Durham Can you follow up with that?

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -1277,10 +1277,13 @@  def maxrev(repo, subset, x):
     Changeset with highest revision number in set.
     """
     os = getset(repo, fullreposet(repo), x)
-    if os:
+    try:
         m = os.max()
         if m in subset:
             return baseset([m])
+    except ValueError:
+        # os was an empty set
+        pass
     return baseset()
 
 def merge(repo, subset, x):
@@ -1316,10 +1319,13 @@  def minrev(repo, subset, x):
     Changeset with lowest revision number in set.
     """
     os = getset(repo, fullreposet(repo), x)
-    if os:
+    try:
         m = os.min()
         if m in subset:
             return baseset([m])
+    except ValueError:
+        # os was an empty set
+        pass
     return baseset()
 
 def modifies(repo, subset, x):