Patchwork [2,of,2] revset: make children only look at commits >= minrev

login
register
mail settings
Submitter Durham Goode
Date Sept. 23, 2014, 6:31 a.m.
Message ID <14087ac481b1d25ccaa6.1411453869@dev2000.prn2.facebook.com>
Download mbox | patch
Permalink /patch/5915/
State Accepted
Headers show

Comments

Durham Goode - Sept. 23, 2014, 6:31 a.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1411450748 25200
#      Mon Sep 22 22:39:08 2014 -0700
# Node ID 14087ac481b1d25ccaa69ed15a014f56c829da65
# Parent  e7e0c696c29e1735aad7a77f82bfc987354a98c1
revset: make children only look at commits >= minrev

Previously children() would iterate over every item in the subset, looking for
children of things in the parentset. We now request that the subset present
itself in descending order, so we can abort early once we see a rev that is <=
the minimum parent.

This, combined with the previous patch, shaves 40% off certain rebase times on
large repositories.  17 seconds -> 9 seconds.

revset #29: (children(ancestor(tip~5, tip)) and ::(tip~5))::
(@ commit)       0) wall 0.171047 comb 0.170000 user 0.170000 (best of 51)
(previous patch) 1) wall 0.155102 comb 0.150000 user 0.150000 (best of 55)
(this patch)     2) wall 0.028842 comb 0.030000 user 0.030000 (best of 100)
Augie Fackler - Sept. 23, 2014, 6:23 p.m.
On Mon, Sep 22, 2014 at 11:31:09PM -0700, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1411450748 25200
> #      Mon Sep 22 22:39:08 2014 -0700
> # Node ID 14087ac481b1d25ccaa69ed15a014f56c829da65
> # Parent  e7e0c696c29e1735aad7a77f82bfc987354a98c1
> revset: make children only look at commits >= minrev

Nice! Both queued, thanks.

>
> Previously children() would iterate over every item in the subset, looking for
> children of things in the parentset. We now request that the subset present
> itself in descending order, so we can abort early once we see a rev that is <=
> the minimum parent.
>
> This, combined with the previous patch, shaves 40% off certain rebase times on
> large repositories.  17 seconds -> 9 seconds.
>
> revset #29: (children(ancestor(tip~5, tip)) and ::(tip~5))::
> (@ commit)       0) wall 0.171047 comb 0.170000 user 0.170000 (best of 51)
> (previous patch) 1) wall 0.155102 comb 0.150000 user 0.150000 (best of 55)
> (this patch)     2) wall 0.028842 comb 0.030000 user 0.030000 (best of 100)
>
> diff --git a/mercurial/revset.py b/mercurial/revset.py
> --- a/mercurial/revset.py
> +++ b/mercurial/revset.py
> @@ -561,9 +561,10 @@
>          return baseset(cs)
>      pr = repo.changelog.parentrevs
>      minrev = min(parentset)
> +    narrow.descending()
>      for r in narrow:
>          if r <= minrev:
> -            continue
> +            break
>          for p in pr(r):
>              if p in parentset:
>                  cs.add(r)
> diff --git a/tests/test-histedit-obsolete.t b/tests/test-histedit-obsolete.t
> --- a/tests/test-histedit-obsolete.t
> +++ b/tests/test-histedit-obsolete.t
> @@ -431,9 +431,9 @@
>    0 files updated, 0 files merged, 2 files removed, 0 files unresolved
>    2 files updated, 0 files merged, 0 files removed, 0 files unresolved
>    0 files updated, 0 files merged, 0 files removed, 0 files unresolved
> +  saved backup bundle to $TESTTMP/folding/.hg/strip-backup/859969f5ed7e-backup.hg (glob)
> +  saved backup bundle to $TESTTMP/folding/.hg/strip-backup/83d1858e070b-backup.hg (glob)
>    saved backup bundle to $TESTTMP/folding/.hg/strip-backup/58019c66f35f-backup.hg (glob)
> -  saved backup bundle to $TESTTMP/folding/.hg/strip-backup/83d1858e070b-backup.hg (glob)
> -  saved backup bundle to $TESTTMP/folding/.hg/strip-backup/859969f5ed7e-backup.hg (glob)
>    $ hg log -G
>    @  19:f9daec13fb98 (secret) i
>    |
> diff --git a/tests/test-rebase-scenario-global.t b/tests/test-rebase-scenario-global.t
> --- a/tests/test-rebase-scenario-global.t
> +++ b/tests/test-rebase-scenario-global.t
> @@ -557,7 +557,7 @@
>    $ hg clone -q -u . ah ah6
>    $ cd ah6
>    $ hg rebase -r '(4+6)::' -d 1
> -  saved backup bundle to $TESTTMP/ah6/.hg/strip-backup/3d8a618087a7-backup.hg (glob)
> +  saved backup bundle to $TESTTMP/ah6/.hg/strip-backup/c01897464e7f-backup.hg (glob)
>    $ hg tglog
>    o  8: 'I'
>    |
> @@ -624,7 +624,7 @@
>  (actual test)
>
>    $ hg rebase --dest 'desc(G)' --rev 'desc(K) + desc(I)'
> -  saved backup bundle to $TESTTMP/a8/.hg/strip-backup/23a4ace37988-backup.hg (glob)
> +  saved backup bundle to $TESTTMP/a8/.hg/strip-backup/e7ec4e813ba6-backup.hg (glob)
>    $ hg log --rev 'children(desc(G))'
>    changeset:   9:adb617877056
>    parent:      6:eea13746799a
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
Pierre-Yves David - Sept. 23, 2014, 6:27 p.m.
On 09/23/2014 11:23 AM, Augie Fackler wrote:
> On Mon, Sep 22, 2014 at 11:31:09PM -0700, Durham Goode wrote:
>> # HG changeset patch
>> # User Durham Goode <durham@fb.com>
>> # Date 1411450748 25200
>> #      Mon Sep 22 22:39:08 2014 -0700
>> # Node ID 14087ac481b1d25ccaa69ed15a014f56c829da65
>> # Parent  e7e0c696c29e1735aad7a77f82bfc987354a98c1
>> revset: make children only look at commits >= minrev
>
> Nice! Both queued, thanks.

Please don't. I really would like that the work on revset focus on 
fixing the core mechanism instead of working around its limitation by 
making the code more complex. Having more complex code will make it 
harder to get the benefit of improving the core mechanism.
Augie Fackler - Sept. 23, 2014, 6:31 p.m.
On Tue, Sep 23, 2014 at 2:27 PM, Pierre-Yves David
<pierre-yves.david@ens-lyon.org> wrote:
> Please don't. I really would like that the work on revset focus on fixing
> the core mechanism instead of working around its limitation by making the
> code more complex. Having more complex code will make it harder to get the
> benefit of improving the core mechanism.


Ack. I'll drop these two for now then.
Durham Goode - Sept. 23, 2014, 6:34 p.m.
On 9/23/14, 11:27 AM, Pierre-Yves David wrote:
>
>
> On 09/23/2014 11:23 AM, Augie Fackler wrote:
>> On Mon, Sep 22, 2014 at 11:31:09PM -0700, Durham Goode wrote:
>>> # HG changeset patch
>>> # User Durham Goode <durham@fb.com>
>>> # Date 1411450748 25200
>>> #      Mon Sep 22 22:39:08 2014 -0700
>>> # Node ID 14087ac481b1d25ccaa69ed15a014f56c829da65
>>> # Parent  e7e0c696c29e1735aad7a77f82bfc987354a98c1
>>> revset: make children only look at commits >= minrev
>>
>> Nice! Both queued, thanks.
>
> Please don't. I really would like that the work on revset focus on 
> fixing the core mechanism instead of working around its limitation by 
> making the code more complex. Having more complex code will make it 
> harder to get the benefit of improving the core mechanism.
I don't think either of these make the code that much more complex, and 
the performance win is pretty massive.  In particular, patch #2 uses the 
smartness of the subset to perform it's work more efficiently, which is 
exactly what the classes are there to allow.

As for patch #1, I still think we should be making all our revsets 
'localwork & subset' instead of 'subset & localwork' because it 
logically indicates the better O(...).  For your fullreposet work, you 
could just have fullreposet.__contains__ return True to provide good 
performance (similar to your optimization for 'subset & localwork').
Durham Goode - Sept. 23, 2014, 11:35 p.m.
On 9/23/14, 11:34 AM, Durham Goode wrote:
>
> On 9/23/14, 11:27 AM, Pierre-Yves David wrote:
>>
>>
>> On 09/23/2014 11:23 AM, Augie Fackler wrote:
>>> On Mon, Sep 22, 2014 at 11:31:09PM -0700, Durham Goode wrote:
>>>> # HG changeset patch
>>>> # User Durham Goode <durham@fb.com>
>>>> # Date 1411450748 25200
>>>> #      Mon Sep 22 22:39:08 2014 -0700
>>>> # Node ID 14087ac481b1d25ccaa69ed15a014f56c829da65
>>>> # Parent  e7e0c696c29e1735aad7a77f82bfc987354a98c1
>>>> revset: make children only look at commits >= minrev
>>>
>>> Nice! Both queued, thanks.
>>
>> Please don't. I really would like that the work on revset focus on 
>> fixing the core mechanism instead of working around its limitation by 
>> making the code more complex. Having more complex code will make it 
>> harder to get the benefit of improving the core mechanism.
> I don't think either of these make the code that much more complex, 
> and the performance win is pretty massive.  In particular, patch #2 
> uses the smartness of the subset to perform it's work more 
> efficiently, which is exactly what the classes are there to allow.
>
> As for patch #1, I still think we should be making all our revsets 
> 'localwork & subset' instead of 'subset & localwork' because it 
> logically indicates the better O(...).  For your fullreposet work, you 
> could just have fullreposet.__contains__ return True to provide good 
> performance (similar to your optimization for 'subset & localwork').
Pierre-Yves and I have come to a stalemate in real life, so perhaps the 
community will have an opinion.

I believe these changes are small enough and impactful enough to go in 
as is. They don't cause any regressions in the revset benchmarks, and 
are unlikely to cause significant regressions in any other cases.  While 
the same performance might be achievable by refactoring the underlying 
revset classes to be smarter, the current change is 3 lines, buys us a 
40% perf improvement, and doesn't make a future refactor any more difficult.
Olle Lundberg - Sept. 24, 2014, 8:15 a.m.
On Wed, Sep 24, 2014 at 1:35 AM, Durham Goode <durham@fb.com> wrote:

>
> On 9/23/14, 11:34 AM, Durham Goode wrote:
>
>>
>> On 9/23/14, 11:27 AM, Pierre-Yves David wrote:
>>
>>>
>>>
>>> On 09/23/2014 11:23 AM, Augie Fackler wrote:
>>>
>>>> On Mon, Sep 22, 2014 at 11:31:09PM -0700, Durham Goode wrote:
>>>>
>>>>> # HG changeset patch
>>>>> # User Durham Goode <durham@fb.com>
>>>>> # Date 1411450748 25200
>>>>> #      Mon Sep 22 22:39:08 2014 -0700
>>>>> # Node ID 14087ac481b1d25ccaa69ed15a014f56c829da65
>>>>> # Parent  e7e0c696c29e1735aad7a77f82bfc987354a98c1
>>>>> revset: make children only look at commits >= minrev
>>>>>
>>>>
>>>> Nice! Both queued, thanks.
>>>>
>>>
>>> Please don't. I really would like that the work on revset focus on
>>> fixing the core mechanism instead of working around its limitation by
>>> making the code more complex. Having more complex code will make it harder
>>> to get the benefit of improving the core mechanism.
>>>
>> I don't think either of these make the code that much more complex, and
>> the performance win is pretty massive.  In particular, patch #2 uses the
>> smartness of the subset to perform it's work more efficiently, which is
>> exactly what the classes are there to allow.
>>
>> As for patch #1, I still think we should be making all our revsets
>> 'localwork & subset' instead of 'subset & localwork' because it logically
>> indicates the better O(...).  For your fullreposet work, you could just
>> have fullreposet.__contains__ return True to provide good performance
>> (similar to your optimization for 'subset & localwork').
>>
> Pierre-Yves and I have come to a stalemate in real life, so perhaps the
> community will have an opinion.
>
> I believe these changes are small enough and impactful enough to go in as
> is. They don't cause any regressions in the revset benchmarks, and are
> unlikely to cause significant regressions in any other cases.  While the
> same performance might be achievable by refactoring the underlying revset
> classes to be smarter, the current change is 3 lines, buys us a 40% perf
> improvement, and doesn't make a future refactor any more difficult.


Push it with an added comment about a future refactor? If we don't get the
full refactored revsets fixed until next release, we would at least have a
faster implementation and a reminder for a refactoring.

>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
>
Augie Fackler - Sept. 24, 2014, 1:44 p.m.
On Wed, Sep 24, 2014 at 4:15 AM, Olle <olle.lundberg@gmail.com> wrote:
> Push it with an added comment about a future refactor? If we don't get the
> full refactored revsets fixed until next release, we would at least have a
> faster implementation and a reminder for a refactoring.


That works for me if Pierre-Yves doesn't object too much.
Pierre-Yves David - Sept. 24, 2014, 4:30 p.m.
On 09/24/2014 06:44 AM, Augie Fackler wrote:
> On Wed, Sep 24, 2014 at 4:15 AM, Olle <olle.lundberg@gmail.com> wrote:
>> Push it with an added comment about a future refactor? If we don't get the
>> full refactored revsets fixed until next release, we would at least have a
>> faster implementation and a reminder for a refactoring.
>
>
> That works for me if Pierre-Yves doesn't object too much.

I still object too much. (longer reply longer today)
Pierre-Yves David - Sept. 25, 2014, 12:09 a.m.
On 09/23/2014 04:35 PM, Durham Goode wrote:
> Pierre-Yves and I have come to a stalemate in real life, so perhaps the
> community will have an opinion.
>
> I believe these changes are small enough and impactful enough to go in
> as is. They don't cause any regressions in the revset benchmarks, and
> are unlikely to cause significant regressions in any other cases.  While
> the same performance might be achievable by refactoring the underlying
> revset classes to be smarter, the current change is 3 lines, buys us a
> 40% perf improvement, and doesn't make a future refactor any more
> difficult.

This change -do- makes it more difficult to improves revset in the 
future. But making revset function less homogeneous you are making it 
harder to take advantage of future improvement in the core class. You 
also make it harder to test the impact of improvement in those core 
class. Moreover by doing dirty optimization for the common case you hide 
obvious issue, letting them being expressed again later when someone add 
new usage of revset. Finally, doing benchmark and regression testing 
gets harder too. The more differently each revsets is, the harder is it 
to get a good picture of the performance impact with testing only a few 
revsets. If all revsets function build it result in a different way we 
end needing to test tens of combination for all of them which make such 
benchmark eventually impossible.

This is not just theoretical statement. This is my very experience of 
someone that works on revset and have to revert this kind of changes to 
be able to do so.

The lazy revset implementation is young and crappy. It has been done by 
an intern in a few months and left uncompleted. (no offense intended for 
the said intern, nothing unexpectedly bad there). What this 
implementation now needs is love, not hotfix. Improving dramatically the 
current implementation is -simple- and -cheap-. We had O(n²) stupidity 
in a common code path for 5 months, and it was solvable by a 3 line 
patches to a core class.

The case you are trying to address in your patch (smallest set should be 
used as the base when doing "and" operation) have been spotted multiple 
time (parents, children), affect multiple real world command (histedit, 
rebase, most probably evolve) And a lot of other revset could benefit 
from this approach (p1, p2, etc). There is obvious way to fix it once 
and for all in the handfull of smartset classes: makes the __and__ 
function aware of some length hint and use that to swap the operand when 
it seems smart to do so. Fixing your case using this approach is not a 
huge amount of work.Detecting that this issue is a pattern and deciding 
how solve it properly were most of the work and it is already here. You 
do not have to produce a complete implementation of this idea to solve 
your idea. Just touching the couple of classes involved in the case you 
are trying to optimize are enough. And this will speedup some other 
revset throughout the code base. Doing only part of this new logic is 
okay once you have started the work in that direction other people may 
complete the other bricks over time without having to do the whole 
diagnostic process you already did.

So please send the couple of few lines patches that would fix this issue 
the right way instead of building technical debt. More time it is 
necessary to produce such patches have been already invested in this debate.
Durham Goode - Sept. 25, 2014, 12:40 a.m.
On 9/25/14, 12:09 AM, "Pierre-Yves David" <pierre-yves.david@ens-lyon.org> 
wrote:

>

>

>On 09/23/2014 04:35 PM, Durham Goode wrote:

>> Pierre-Yves and I have come to a stalemate in real life, so perhaps the

>> community will have an opinion.

>>

>> I believe these changes are small enough and impactful enough to go in

>> as is. They don't cause any regressions in the revset benchmarks, and

>> are unlikely to cause significant regressions in any other cases.  While

>> the same performance might be achievable by refactoring the underlying

>> revset classes to be smarter, the current change is 3 lines, buys us a

>> 40% perf improvement, and doesn't make a future refactor any more

>> difficult.

>

>This change -do- makes it more difficult to improves revset in the 

>future. But making revset function less homogeneous you are making it 

>harder to take advantage of future improvement in the core class. You 

>also make it harder to test the impact of improvement in those core 

>class. Moreover by doing dirty optimization for the common case you hide 

>obvious issue, letting them being expressed again later when someone add 

>new usage of revset. Finally, doing benchmark and regression testing 

>gets harder too. The more differently each revsets is, the harder is it 

>to get a good picture of the performance impact with testing only a few 

>revsets. If all revsets function build it result in a different way we 

>end needing to test tens of combination for all of them which make such 

>benchmark eventually impossible.

>

>This is not just theoretical statement. This is my very experience of 

>someone that works on revset and have to revert this kind of changes to 

>be able to do so.

>

>The lazy revset implementation is young and crappy. It has been done by 

>an intern in a few months and left uncompleted. (no offense intended for 

>the said intern, nothing unexpectedly bad there). What this 

>implementation now needs is love, not hotfix. Improving dramatically the 

>current implementation is -simple- and -cheap-. We had O(n²) stupidity 

>in a common code path for 5 months, and it was solvable by a 3 line 

>patches to a core class.

>

>The case you are trying to address in your patch (smallest set should be 

>used as the base when doing "and" operation) have been spotted multiple 

>time (parents, children), affect multiple real world command (histedit, 

>rebase, most probably evolve) And a lot of other revset could benefit 

>from this approach (p1, p2, etc). There is obvious way to fix it once 

>and for all in the handfull of smartset classes: makes the __and__ 

>function aware of some length hint and use that to swap the operand when 

>it seems smart to do so. Fixing your case using this approach is not a 

>huge amount of work.Detecting that this issue is a pattern and deciding 

>how solve it properly were most of the work and it is already here. You 

>do not have to produce a complete implementation of this idea to solve 

>your idea. Just touching the couple of classes involved in the case you 

>are trying to optimize are enough. And this will speedup some other 

>revset throughout the code base. Doing only part of this new logic is 

>okay once you have started the work in that direction other people may 

>complete the other bricks over time without having to do the whole 

>diagnostic process you already did.

>

>So please send the couple of few lines patches that would fix this issue 

>the right way instead of building technical debt. More time it is 

>necessary to produce such patches have been already invested in this 

>debate.

>

>-- 

>Pierre-Yves David

>



The appropriate refactor would always be nice to have, but I have other 
things to be doing (converting tools to hg, building git equivalents, 
improving perf in other commands, etc) and they needed to be done 
yesterday. Even if I did do the refactor, it has the potential to cause 
wider side effects (possibly negative), then I’m spending even more time 
on the issue. I fight this, not because it’s a large amount of time, but 
because I don’t think we should reject massive perf wins just because it 
introduces minimal technical debt (and this is minimal. Anyone doing 
revset refactoring could flip the & trivially, and the proposed refactor 
would work the exact same even with my patches).

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -561,9 +561,10 @@ 
         return baseset(cs)
     pr = repo.changelog.parentrevs
     minrev = min(parentset)
+    narrow.descending()
     for r in narrow:
         if r <= minrev:
-            continue
+            break
         for p in pr(r):
             if p in parentset:
                 cs.add(r)
diff --git a/tests/test-histedit-obsolete.t b/tests/test-histedit-obsolete.t
--- a/tests/test-histedit-obsolete.t
+++ b/tests/test-histedit-obsolete.t
@@ -431,9 +431,9 @@ 
   0 files updated, 0 files merged, 2 files removed, 0 files unresolved
   2 files updated, 0 files merged, 0 files removed, 0 files unresolved
   0 files updated, 0 files merged, 0 files removed, 0 files unresolved
+  saved backup bundle to $TESTTMP/folding/.hg/strip-backup/859969f5ed7e-backup.hg (glob)
+  saved backup bundle to $TESTTMP/folding/.hg/strip-backup/83d1858e070b-backup.hg (glob)
   saved backup bundle to $TESTTMP/folding/.hg/strip-backup/58019c66f35f-backup.hg (glob)
-  saved backup bundle to $TESTTMP/folding/.hg/strip-backup/83d1858e070b-backup.hg (glob)
-  saved backup bundle to $TESTTMP/folding/.hg/strip-backup/859969f5ed7e-backup.hg (glob)
   $ hg log -G
   @  19:f9daec13fb98 (secret) i
   |
diff --git a/tests/test-rebase-scenario-global.t b/tests/test-rebase-scenario-global.t
--- a/tests/test-rebase-scenario-global.t
+++ b/tests/test-rebase-scenario-global.t
@@ -557,7 +557,7 @@ 
   $ hg clone -q -u . ah ah6
   $ cd ah6
   $ hg rebase -r '(4+6)::' -d 1
-  saved backup bundle to $TESTTMP/ah6/.hg/strip-backup/3d8a618087a7-backup.hg (glob)
+  saved backup bundle to $TESTTMP/ah6/.hg/strip-backup/c01897464e7f-backup.hg (glob)
   $ hg tglog
   o  8: 'I'
   |
@@ -624,7 +624,7 @@ 
 (actual test)
 
   $ hg rebase --dest 'desc(G)' --rev 'desc(K) + desc(I)'
-  saved backup bundle to $TESTTMP/a8/.hg/strip-backup/23a4ace37988-backup.hg (glob)
+  saved backup bundle to $TESTTMP/a8/.hg/strip-backup/e7ec4e813ba6-backup.hg (glob)
   $ hg log --rev 'children(desc(G))'
   changeset:   9:adb617877056
   parent:      6:eea13746799a