Patchwork [STABLE] rebase: improve base revset performance

login
register
mail settings
Submitter Durham Goode
Date Oct. 21, 2014, 2:11 a.m.
Message ID <1a9990727f5d75d70bb0.1413857472@dev2000.prn2.facebook.com>
Download mbox | patch
Permalink /patch/6432/
State Accepted
Headers show

Comments

Durham Goode - Oct. 21, 2014, 2:11 a.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1413856209 25200
#      Mon Oct 20 18:50:09 2014 -0700
# Node ID 1a9990727f5d75d70bb0d00fdffac571449574b6
# Parent  67cb1ab1ad1db6a66b96ac0c0333256b5383a3a2
rebase: improve base revset performance

The old revset had pretty terrible performance on large repositories (12+
seconds). This new revset achieves the same result in only 0.7s. As we improve
the underlying revset APIs we can probably get this revset down to 'only(base,
dest)::', but at the moment that version still takes 2s.
Mads Kiilerich - Oct. 21, 2014, 3:15 a.m.
On 10/21/2014 04:11 AM, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1413856209 25200
> #      Mon Oct 20 18:50:09 2014 -0700
> # Node ID 1a9990727f5d75d70bb0d00fdffac571449574b6
> # Parent  67cb1ab1ad1db6a66b96ac0c0333256b5383a3a2
> rebase: improve base revset performance
>
> The old revset had pretty terrible performance on large repositories (12+
> seconds). This new revset achieves the same result in only 0.7s. As we improve
> the underlying revset APIs we can probably get this revset down to 'only(base,
> dest)::', but at the moment that version still takes 2s.
>
> diff --git a/hgext/rebase.py b/hgext/rebase.py
> --- a/hgext/rebase.py
> +++ b/hgext/rebase.py
> @@ -272,9 +272,9 @@ def rebase(ui, repo, **opts):
>                       ui.status(_('empty "base" revision set - '
>                                   "can't compute rebase set\n"))
>                       return 1
> -                rebaseset = repo.revs(
> -                    '(children(ancestor(%ld, %d)) and ::(%ld))::',
> -                    base, dest, base)
> +                commonanc = repo.revs('ancestor(%ld, %d)', base, dest).first()
> +                rebaseset = repo.revs('(%d::(%ld) - %d)::',
> +                                      commonanc, base, commonanc)

Nice, but not really suitable for stable according to the normal policy. 
Let's see.

I guess the trick is at a::b has very efficient pruning. Very clever.

The () around %ld is just cargo cult, right?

[Here I would have said something important. But it turned out that I 
didn't have anything important to say. Only that it is an "interesting" 
challenge to optimize the old expression and the proposed expressions to 
make them as fast as this rewrite. So I will just say:]

LGTM.

/Mads
Pierre-Yves David - Oct. 21, 2014, 5:25 a.m.
On 10/20/2014 07:11 PM, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1413856209 25200
> #      Mon Oct 20 18:50:09 2014 -0700
> # Node ID 1a9990727f5d75d70bb0d00fdffac571449574b6
> # Parent  67cb1ab1ad1db6a66b96ac0c0333256b5383a3a2
> rebase: improve base revset performance
>
> The old revset had pretty terrible performance on large repositories (12+
> seconds). This new revset achieves the same result in only 0.7s. As we improve
> the underlying revset APIs we can probably get this revset down to 'only(base,
> dest)::', but at the moment that version still takes 2s.
>
> diff --git a/hgext/rebase.py b/hgext/rebase.py
> --- a/hgext/rebase.py
> +++ b/hgext/rebase.py
> @@ -272,9 +272,9 @@ def rebase(ui, repo, **opts):
>                       ui.status(_('empty "base" revision set - '
>                                   "can't compute rebase set\n"))
>                       return 1
> -                rebaseset = repo.revs(
> -                    '(children(ancestor(%ld, %d)) and ::(%ld))::',
> -                    base, dest, base)
> +                commonanc = repo.revs('ancestor(%ld, %d)', base, dest).first()
> +                rebaseset = repo.revs('(%d::(%ld) - %d)::',
> +                                      commonanc, base, commonanc)

So we suffer here from the cost of `::(X)` and the unability of storing 
the ancestors call in a variable.

I wonder if if we can get similar performance by limiting the walk of 
`::(X)` with the min of the subset.

There is probably no value for going through revset got the ancestors 
call. We could call the ancestors code directly.

Not sure about stable suitability.

>                   if not rebaseset:
>                       # transform to list because smartsets are not comparable to
>                       # lists. This should be improved to honor lazyness of
Durham Goode - Oct. 21, 2014, 4:49 p.m.
On 10/20/14 10:25 PM, Pierre-Yves David wrote:
>
>
> On 10/20/2014 07:11 PM, Durham Goode wrote:
>> # HG changeset patch
>> # User Durham Goode <durham@fb.com>
>> # Date 1413856209 25200
>> #      Mon Oct 20 18:50:09 2014 -0700
>> # Node ID 1a9990727f5d75d70bb0d00fdffac571449574b6
>> # Parent  67cb1ab1ad1db6a66b96ac0c0333256b5383a3a2
>> rebase: improve base revset performance
>>
>> The old revset had pretty terrible performance on large repositories 
>> (12+
>> seconds). This new revset achieves the same result in only 0.7s. As 
>> we improve
>> the underlying revset APIs we can probably get this revset down to 
>> 'only(base,
>> dest)::', but at the moment that version still takes 2s.
>>
>> diff --git a/hgext/rebase.py b/hgext/rebase.py
>> --- a/hgext/rebase.py
>> +++ b/hgext/rebase.py
>> @@ -272,9 +272,9 @@ def rebase(ui, repo, **opts):
>>                       ui.status(_('empty "base" revision set - '
>>                                   "can't compute rebase set\n"))
>>                       return 1
>> -                rebaseset = repo.revs(
>> -                    '(children(ancestor(%ld, %d)) and ::(%ld))::',
>> -                    base, dest, base)
>> +                commonanc = repo.revs('ancestor(%ld, %d)', base, 
>> dest).first()
>> +                rebaseset = repo.revs('(%d::(%ld) - %d)::',
>> +                                      commonanc, base, commonanc)
>
> So we suffer here from the cost of `::(X)` and the unability of 
> storing the ancestors call in a variable.
>
The common ancestor is very cheap to compute, so storing it in a 
variable was just so I didn't have to duplicate the ancestor() bit in 
the revset in an ugly way, not really for performance.
> I wonder if if we can get similar performance by limiting the walk of 
> `::(X)` with the min of the subset.
>
In theory the revset code should be smart enough to do this efficiently, 
but that change is too risky for the freeze.
> There is probably no value for going through revset got the ancestors 
> call. We could call the ancestors code directly.
>
I wanted to minimize the deviation from the old code.  Using 
ancestor.whatever() might have subtle differences I'm not aware of.
> Not sure about stable suitability.
I'll let Matt decide.  He seemed interested in taking my old original 
fix for stable before the freeze, so I figured this would qualify.
Matt Mackall - Oct. 22, 2014, 11:30 p.m.
On Mon, 2014-10-20 at 19:11 -0700, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1413856209 25200
> #      Mon Oct 20 18:50:09 2014 -0700
> # Node ID 1a9990727f5d75d70bb0d00fdffac571449574b6
> # Parent  67cb1ab1ad1db6a66b96ac0c0333256b5383a3a2
> rebase: improve base revset performance

Queued for stable, thanks.

Patch

diff --git a/hgext/rebase.py b/hgext/rebase.py
--- a/hgext/rebase.py
+++ b/hgext/rebase.py
@@ -272,9 +272,9 @@  def rebase(ui, repo, **opts):
                     ui.status(_('empty "base" revision set - '
                                 "can't compute rebase set\n"))
                     return 1
-                rebaseset = repo.revs(
-                    '(children(ancestor(%ld, %d)) and ::(%ld))::',
-                    base, dest, base)
+                commonanc = repo.revs('ancestor(%ld, %d)', base, dest).first()
+                rebaseset = repo.revs('(%d::(%ld) - %d)::',
+                                      commonanc, base, commonanc)
                 if not rebaseset:
                     # transform to list because smartsets are not comparable to
                     # lists. This should be improved to honor lazyness of