Submitter | Durham Goode |
---|---|
Date | Sept. 23, 2014, 6:31 a.m. |
Message ID | <14087ac481b1d25ccaa6.1411453869@dev2000.prn2.facebook.com> |
Download | mbox | patch |
Permalink | /patch/5915/ |
State | Accepted |
Headers | show |
Comments
On Mon, Sep 22, 2014 at 11:31:09PM -0700, Durham Goode wrote: > # HG changeset patch > # User Durham Goode <durham@fb.com> > # Date 1411450748 25200 > # Mon Sep 22 22:39:08 2014 -0700 > # Node ID 14087ac481b1d25ccaa69ed15a014f56c829da65 > # Parent e7e0c696c29e1735aad7a77f82bfc987354a98c1 > revset: make children only look at commits >= minrev Nice! Both queued, thanks. > > Previously children() would iterate over every item in the subset, looking for > children of things in the parentset. We now request that the subset present > itself in descending order, so we can abort early once we see a rev that is <= > the minimum parent. > > This, combined with the previous patch, shaves 40% off certain rebase times on > large repositories. 17 seconds -> 9 seconds. > > revset #29: (children(ancestor(tip~5, tip)) and ::(tip~5)):: > (@ commit) 0) wall 0.171047 comb 0.170000 user 0.170000 (best of 51) > (previous patch) 1) wall 0.155102 comb 0.150000 user 0.150000 (best of 55) > (this patch) 2) wall 0.028842 comb 0.030000 user 0.030000 (best of 100) > > diff --git a/mercurial/revset.py b/mercurial/revset.py > --- a/mercurial/revset.py > +++ b/mercurial/revset.py > @@ -561,9 +561,10 @@ > return baseset(cs) > pr = repo.changelog.parentrevs > minrev = min(parentset) > + narrow.descending() > for r in narrow: > if r <= minrev: > - continue > + break > for p in pr(r): > if p in parentset: > cs.add(r) > diff --git a/tests/test-histedit-obsolete.t b/tests/test-histedit-obsolete.t > --- a/tests/test-histedit-obsolete.t > +++ b/tests/test-histedit-obsolete.t > @@ -431,9 +431,9 @@ > 0 files updated, 0 files merged, 2 files removed, 0 files unresolved > 2 files updated, 0 files merged, 0 files removed, 0 files unresolved > 0 files updated, 0 files merged, 0 files removed, 0 files unresolved > + saved backup bundle to $TESTTMP/folding/.hg/strip-backup/859969f5ed7e-backup.hg (glob) > + saved backup bundle to $TESTTMP/folding/.hg/strip-backup/83d1858e070b-backup.hg (glob) > saved backup bundle to $TESTTMP/folding/.hg/strip-backup/58019c66f35f-backup.hg (glob) > - saved backup bundle to $TESTTMP/folding/.hg/strip-backup/83d1858e070b-backup.hg (glob) > - saved backup bundle to $TESTTMP/folding/.hg/strip-backup/859969f5ed7e-backup.hg (glob) > $ hg log -G > @ 19:f9daec13fb98 (secret) i > | > diff --git a/tests/test-rebase-scenario-global.t b/tests/test-rebase-scenario-global.t > --- a/tests/test-rebase-scenario-global.t > +++ b/tests/test-rebase-scenario-global.t > @@ -557,7 +557,7 @@ > $ hg clone -q -u . ah ah6 > $ cd ah6 > $ hg rebase -r '(4+6)::' -d 1 > - saved backup bundle to $TESTTMP/ah6/.hg/strip-backup/3d8a618087a7-backup.hg (glob) > + saved backup bundle to $TESTTMP/ah6/.hg/strip-backup/c01897464e7f-backup.hg (glob) > $ hg tglog > o 8: 'I' > | > @@ -624,7 +624,7 @@ > (actual test) > > $ hg rebase --dest 'desc(G)' --rev 'desc(K) + desc(I)' > - saved backup bundle to $TESTTMP/a8/.hg/strip-backup/23a4ace37988-backup.hg (glob) > + saved backup bundle to $TESTTMP/a8/.hg/strip-backup/e7ec4e813ba6-backup.hg (glob) > $ hg log --rev 'children(desc(G))' > changeset: 9:adb617877056 > parent: 6:eea13746799a > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@selenic.com > http://selenic.com/mailman/listinfo/mercurial-devel
On 09/23/2014 11:23 AM, Augie Fackler wrote: > On Mon, Sep 22, 2014 at 11:31:09PM -0700, Durham Goode wrote: >> # HG changeset patch >> # User Durham Goode <durham@fb.com> >> # Date 1411450748 25200 >> # Mon Sep 22 22:39:08 2014 -0700 >> # Node ID 14087ac481b1d25ccaa69ed15a014f56c829da65 >> # Parent e7e0c696c29e1735aad7a77f82bfc987354a98c1 >> revset: make children only look at commits >= minrev > > Nice! Both queued, thanks. Please don't. I really would like that the work on revset focus on fixing the core mechanism instead of working around its limitation by making the code more complex. Having more complex code will make it harder to get the benefit of improving the core mechanism.
On Tue, Sep 23, 2014 at 2:27 PM, Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote: > Please don't. I really would like that the work on revset focus on fixing > the core mechanism instead of working around its limitation by making the > code more complex. Having more complex code will make it harder to get the > benefit of improving the core mechanism. Ack. I'll drop these two for now then.
On 9/23/14, 11:27 AM, Pierre-Yves David wrote: > > > On 09/23/2014 11:23 AM, Augie Fackler wrote: >> On Mon, Sep 22, 2014 at 11:31:09PM -0700, Durham Goode wrote: >>> # HG changeset patch >>> # User Durham Goode <durham@fb.com> >>> # Date 1411450748 25200 >>> # Mon Sep 22 22:39:08 2014 -0700 >>> # Node ID 14087ac481b1d25ccaa69ed15a014f56c829da65 >>> # Parent e7e0c696c29e1735aad7a77f82bfc987354a98c1 >>> revset: make children only look at commits >= minrev >> >> Nice! Both queued, thanks. > > Please don't. I really would like that the work on revset focus on > fixing the core mechanism instead of working around its limitation by > making the code more complex. Having more complex code will make it > harder to get the benefit of improving the core mechanism. I don't think either of these make the code that much more complex, and the performance win is pretty massive. In particular, patch #2 uses the smartness of the subset to perform it's work more efficiently, which is exactly what the classes are there to allow. As for patch #1, I still think we should be making all our revsets 'localwork & subset' instead of 'subset & localwork' because it logically indicates the better O(...). For your fullreposet work, you could just have fullreposet.__contains__ return True to provide good performance (similar to your optimization for 'subset & localwork').
On 9/23/14, 11:34 AM, Durham Goode wrote: > > On 9/23/14, 11:27 AM, Pierre-Yves David wrote: >> >> >> On 09/23/2014 11:23 AM, Augie Fackler wrote: >>> On Mon, Sep 22, 2014 at 11:31:09PM -0700, Durham Goode wrote: >>>> # HG changeset patch >>>> # User Durham Goode <durham@fb.com> >>>> # Date 1411450748 25200 >>>> # Mon Sep 22 22:39:08 2014 -0700 >>>> # Node ID 14087ac481b1d25ccaa69ed15a014f56c829da65 >>>> # Parent e7e0c696c29e1735aad7a77f82bfc987354a98c1 >>>> revset: make children only look at commits >= minrev >>> >>> Nice! Both queued, thanks. >> >> Please don't. I really would like that the work on revset focus on >> fixing the core mechanism instead of working around its limitation by >> making the code more complex. Having more complex code will make it >> harder to get the benefit of improving the core mechanism. > I don't think either of these make the code that much more complex, > and the performance win is pretty massive. In particular, patch #2 > uses the smartness of the subset to perform it's work more > efficiently, which is exactly what the classes are there to allow. > > As for patch #1, I still think we should be making all our revsets > 'localwork & subset' instead of 'subset & localwork' because it > logically indicates the better O(...). For your fullreposet work, you > could just have fullreposet.__contains__ return True to provide good > performance (similar to your optimization for 'subset & localwork'). Pierre-Yves and I have come to a stalemate in real life, so perhaps the community will have an opinion. I believe these changes are small enough and impactful enough to go in as is. They don't cause any regressions in the revset benchmarks, and are unlikely to cause significant regressions in any other cases. While the same performance might be achievable by refactoring the underlying revset classes to be smarter, the current change is 3 lines, buys us a 40% perf improvement, and doesn't make a future refactor any more difficult.
On Wed, Sep 24, 2014 at 1:35 AM, Durham Goode <durham@fb.com> wrote: > > On 9/23/14, 11:34 AM, Durham Goode wrote: > >> >> On 9/23/14, 11:27 AM, Pierre-Yves David wrote: >> >>> >>> >>> On 09/23/2014 11:23 AM, Augie Fackler wrote: >>> >>>> On Mon, Sep 22, 2014 at 11:31:09PM -0700, Durham Goode wrote: >>>> >>>>> # HG changeset patch >>>>> # User Durham Goode <durham@fb.com> >>>>> # Date 1411450748 25200 >>>>> # Mon Sep 22 22:39:08 2014 -0700 >>>>> # Node ID 14087ac481b1d25ccaa69ed15a014f56c829da65 >>>>> # Parent e7e0c696c29e1735aad7a77f82bfc987354a98c1 >>>>> revset: make children only look at commits >= minrev >>>>> >>>> >>>> Nice! Both queued, thanks. >>>> >>> >>> Please don't. I really would like that the work on revset focus on >>> fixing the core mechanism instead of working around its limitation by >>> making the code more complex. Having more complex code will make it harder >>> to get the benefit of improving the core mechanism. >>> >> I don't think either of these make the code that much more complex, and >> the performance win is pretty massive. In particular, patch #2 uses the >> smartness of the subset to perform it's work more efficiently, which is >> exactly what the classes are there to allow. >> >> As for patch #1, I still think we should be making all our revsets >> 'localwork & subset' instead of 'subset & localwork' because it logically >> indicates the better O(...). For your fullreposet work, you could just >> have fullreposet.__contains__ return True to provide good performance >> (similar to your optimization for 'subset & localwork'). >> > Pierre-Yves and I have come to a stalemate in real life, so perhaps the > community will have an opinion. > > I believe these changes are small enough and impactful enough to go in as > is. They don't cause any regressions in the revset benchmarks, and are > unlikely to cause significant regressions in any other cases. While the > same performance might be achievable by refactoring the underlying revset > classes to be smarter, the current change is 3 lines, buys us a 40% perf > improvement, and doesn't make a future refactor any more difficult. Push it with an added comment about a future refactor? If we don't get the full refactored revsets fixed until next release, we would at least have a faster implementation and a reminder for a refactoring. > > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@selenic.com > http://selenic.com/mailman/listinfo/mercurial-devel >
On Wed, Sep 24, 2014 at 4:15 AM, Olle <olle.lundberg@gmail.com> wrote: > Push it with an added comment about a future refactor? If we don't get the > full refactored revsets fixed until next release, we would at least have a > faster implementation and a reminder for a refactoring. That works for me if Pierre-Yves doesn't object too much.
On 09/24/2014 06:44 AM, Augie Fackler wrote: > On Wed, Sep 24, 2014 at 4:15 AM, Olle <olle.lundberg@gmail.com> wrote: >> Push it with an added comment about a future refactor? If we don't get the >> full refactored revsets fixed until next release, we would at least have a >> faster implementation and a reminder for a refactoring. > > > That works for me if Pierre-Yves doesn't object too much. I still object too much. (longer reply longer today)
On 09/23/2014 04:35 PM, Durham Goode wrote: > Pierre-Yves and I have come to a stalemate in real life, so perhaps the > community will have an opinion. > > I believe these changes are small enough and impactful enough to go in > as is. They don't cause any regressions in the revset benchmarks, and > are unlikely to cause significant regressions in any other cases. While > the same performance might be achievable by refactoring the underlying > revset classes to be smarter, the current change is 3 lines, buys us a > 40% perf improvement, and doesn't make a future refactor any more > difficult. This change -do- makes it more difficult to improves revset in the future. But making revset function less homogeneous you are making it harder to take advantage of future improvement in the core class. You also make it harder to test the impact of improvement in those core class. Moreover by doing dirty optimization for the common case you hide obvious issue, letting them being expressed again later when someone add new usage of revset. Finally, doing benchmark and regression testing gets harder too. The more differently each revsets is, the harder is it to get a good picture of the performance impact with testing only a few revsets. If all revsets function build it result in a different way we end needing to test tens of combination for all of them which make such benchmark eventually impossible. This is not just theoretical statement. This is my very experience of someone that works on revset and have to revert this kind of changes to be able to do so. The lazy revset implementation is young and crappy. It has been done by an intern in a few months and left uncompleted. (no offense intended for the said intern, nothing unexpectedly bad there). What this implementation now needs is love, not hotfix. Improving dramatically the current implementation is -simple- and -cheap-. We had O(n²) stupidity in a common code path for 5 months, and it was solvable by a 3 line patches to a core class. The case you are trying to address in your patch (smallest set should be used as the base when doing "and" operation) have been spotted multiple time (parents, children), affect multiple real world command (histedit, rebase, most probably evolve) And a lot of other revset could benefit from this approach (p1, p2, etc). There is obvious way to fix it once and for all in the handfull of smartset classes: makes the __and__ function aware of some length hint and use that to swap the operand when it seems smart to do so. Fixing your case using this approach is not a huge amount of work.Detecting that this issue is a pattern and deciding how solve it properly were most of the work and it is already here. You do not have to produce a complete implementation of this idea to solve your idea. Just touching the couple of classes involved in the case you are trying to optimize are enough. And this will speedup some other revset throughout the code base. Doing only part of this new logic is okay once you have started the work in that direction other people may complete the other bricks over time without having to do the whole diagnostic process you already did. So please send the couple of few lines patches that would fix this issue the right way instead of building technical debt. More time it is necessary to produce such patches have been already invested in this debate.
On 9/25/14, 12:09 AM, "Pierre-Yves David" <pierre-yves.david@ens-lyon.org> wrote: > > >On 09/23/2014 04:35 PM, Durham Goode wrote: >> Pierre-Yves and I have come to a stalemate in real life, so perhaps the >> community will have an opinion. >> >> I believe these changes are small enough and impactful enough to go in >> as is. They don't cause any regressions in the revset benchmarks, and >> are unlikely to cause significant regressions in any other cases. While >> the same performance might be achievable by refactoring the underlying >> revset classes to be smarter, the current change is 3 lines, buys us a >> 40% perf improvement, and doesn't make a future refactor any more >> difficult. > >This change -do- makes it more difficult to improves revset in the >future. But making revset function less homogeneous you are making it >harder to take advantage of future improvement in the core class. You >also make it harder to test the impact of improvement in those core >class. Moreover by doing dirty optimization for the common case you hide >obvious issue, letting them being expressed again later when someone add >new usage of revset. Finally, doing benchmark and regression testing >gets harder too. The more differently each revsets is, the harder is it >to get a good picture of the performance impact with testing only a few >revsets. If all revsets function build it result in a different way we >end needing to test tens of combination for all of them which make such >benchmark eventually impossible. > >This is not just theoretical statement. This is my very experience of >someone that works on revset and have to revert this kind of changes to >be able to do so. > >The lazy revset implementation is young and crappy. It has been done by >an intern in a few months and left uncompleted. (no offense intended for >the said intern, nothing unexpectedly bad there). What this >implementation now needs is love, not hotfix. Improving dramatically the >current implementation is -simple- and -cheap-. We had O(n²) stupidity >in a common code path for 5 months, and it was solvable by a 3 line >patches to a core class. > >The case you are trying to address in your patch (smallest set should be >used as the base when doing "and" operation) have been spotted multiple >time (parents, children), affect multiple real world command (histedit, >rebase, most probably evolve) And a lot of other revset could benefit >from this approach (p1, p2, etc). There is obvious way to fix it once >and for all in the handfull of smartset classes: makes the __and__ >function aware of some length hint and use that to swap the operand when >it seems smart to do so. Fixing your case using this approach is not a >huge amount of work.Detecting that this issue is a pattern and deciding >how solve it properly were most of the work and it is already here. You >do not have to produce a complete implementation of this idea to solve >your idea. Just touching the couple of classes involved in the case you >are trying to optimize are enough. And this will speedup some other >revset throughout the code base. Doing only part of this new logic is >okay once you have started the work in that direction other people may >complete the other bricks over time without having to do the whole >diagnostic process you already did. > >So please send the couple of few lines patches that would fix this issue >the right way instead of building technical debt. More time it is >necessary to produce such patches have been already invested in this >debate. > >-- >Pierre-Yves David > The appropriate refactor would always be nice to have, but I have other things to be doing (converting tools to hg, building git equivalents, improving perf in other commands, etc) and they needed to be done yesterday. Even if I did do the refactor, it has the potential to cause wider side effects (possibly negative), then I’m spending even more time on the issue. I fight this, not because it’s a large amount of time, but because I don’t think we should reject massive perf wins just because it introduces minimal technical debt (and this is minimal. Anyone doing revset refactoring could flip the & trivially, and the proposed refactor would work the exact same even with my patches).
Patch
diff --git a/mercurial/revset.py b/mercurial/revset.py --- a/mercurial/revset.py +++ b/mercurial/revset.py @@ -561,9 +561,10 @@ return baseset(cs) pr = repo.changelog.parentrevs minrev = min(parentset) + narrow.descending() for r in narrow: if r <= minrev: - continue + break for p in pr(r): if p in parentset: cs.add(r) diff --git a/tests/test-histedit-obsolete.t b/tests/test-histedit-obsolete.t --- a/tests/test-histedit-obsolete.t +++ b/tests/test-histedit-obsolete.t @@ -431,9 +431,9 @@ 0 files updated, 0 files merged, 2 files removed, 0 files unresolved 2 files updated, 0 files merged, 0 files removed, 0 files unresolved 0 files updated, 0 files merged, 0 files removed, 0 files unresolved + saved backup bundle to $TESTTMP/folding/.hg/strip-backup/859969f5ed7e-backup.hg (glob) + saved backup bundle to $TESTTMP/folding/.hg/strip-backup/83d1858e070b-backup.hg (glob) saved backup bundle to $TESTTMP/folding/.hg/strip-backup/58019c66f35f-backup.hg (glob) - saved backup bundle to $TESTTMP/folding/.hg/strip-backup/83d1858e070b-backup.hg (glob) - saved backup bundle to $TESTTMP/folding/.hg/strip-backup/859969f5ed7e-backup.hg (glob) $ hg log -G @ 19:f9daec13fb98 (secret) i | diff --git a/tests/test-rebase-scenario-global.t b/tests/test-rebase-scenario-global.t --- a/tests/test-rebase-scenario-global.t +++ b/tests/test-rebase-scenario-global.t @@ -557,7 +557,7 @@ $ hg clone -q -u . ah ah6 $ cd ah6 $ hg rebase -r '(4+6)::' -d 1 - saved backup bundle to $TESTTMP/ah6/.hg/strip-backup/3d8a618087a7-backup.hg (glob) + saved backup bundle to $TESTTMP/ah6/.hg/strip-backup/c01897464e7f-backup.hg (glob) $ hg tglog o 8: 'I' | @@ -624,7 +624,7 @@ (actual test) $ hg rebase --dest 'desc(G)' --rev 'desc(K) + desc(I)' - saved backup bundle to $TESTTMP/a8/.hg/strip-backup/23a4ace37988-backup.hg (glob) + saved backup bundle to $TESTTMP/a8/.hg/strip-backup/e7ec4e813ba6-backup.hg (glob) $ hg log --rev 'children(desc(G))' changeset: 9:adb617877056 parent: 6:eea13746799a