Submitter | Boris Feld |
---|---|
Date | Sept. 8, 2018, 10:56 a.m. |
Message ID | <bdcfd96d87254a508e85.1536404215@localhost.localdomain> |
Download | mbox | patch |
Permalink | /patch/34425/ |
State | Accepted |
Headers | show |
Comments
On Sat, Sep 8, 2018 at 3:57 AM Boris Feld <boris.feld@octobus.net> wrote: > # HG changeset patch > # User Boris Feld <boris.feld@octobus.net> > # Date 1536087921 -7200 > # Tue Sep 04 21:05:21 2018 +0200 > # Node ID bdcfd96d87254a508e85a6eb502ffe4c57845bc8 > # Parent 6268fed317d04dd8f0430468818ea5d0529b6503 > # EXP-Topic sparse-snapshot > # Available At https://bitbucket.org/octobus/mercurial-devel/ > # hg pull https://bitbucket.org/octobus/mercurial-devel/ -r > bdcfd96d8725 > revlog: drop duplicated code > Queued this series. Thank you for the detailed explanations in the commit messages and the inline comments. It really helps to grok how all of this works. I find this work around sparse revlogs and more intelligent delta chains *very* interesting. And the results obviously speak for themselves. It's worth noting that when we have alternate storage backends that may not be based on append-only storage and linear access models, more complex delta strategies like this make a *lot* more sense and assuming the performance impact is reasonable, should obviously be the preferred mechanism. I'm also a fan of limiting the max chain length: setting an upper bound on the chain length starts to set an upper bound on the maximum work that needs to be performed at read/write time. While not addressed by this series, another potential area for improvement is setting a bounded upper limit on the total size of stored deltas: since decompression and patch application is somewhat linearly proportional to the input data size, bounding the total size of data that needs to be decoded in order to resolve a revision fulltext would potentially be a *better* bounding mechanism than chain length itself. I'll be honest, I'm not sure what the heuristic would be to determine what that bounded ceiling should be. IMO it is worth investigating though. Regarding performance, there are a handful of quadratic algorithms in this code. We have several calls to deltaparent() both directly and indirectly. I feel like this will add significant overhead when adding multiple revisions to large revlogs (>100,000 revisions). What I'm not sure about is how this overhead compares to the overhead of resolving parent revisions and computing deltas against them. I expect revision resolution and delta computation to take most of the CPU. But at the same time, quadratic algorithms coupled with Python overhead can make things very slow. I look forward to the follow-up series addressing performance. > > This code probably got duplicated by a rebase/evolve conflict. We drop the > extra copy, the other copy is right below. > > This had no real effects since other logic ensure that we never test the > same > revision twice. > > diff --git a/mercurial/revlogutils/deltas.py > b/mercurial/revlogutils/deltas.py > --- a/mercurial/revlogutils/deltas.py > +++ b/mercurial/revlogutils/deltas.py > @@ -621,19 +621,6 @@ def _rawgroups(revlog, p1, p2, cachedelt > curr = len(revlog) > prev = curr - 1 > > - # should we try to build a delta? > - if prev != nullrev and revlog._storedeltachains: > - tested = set() > - # This condition is true most of the time when processing > - # changegroup data into a generaldelta repo. The only time it > - # isn't true is if this is the first revision in a delta chain > - # or if ``format.generaldelta=true`` disabled ``lazydeltabase``. > - if cachedelta and gdelta and revlog._lazydeltabase: > - # Assume what we received from the server is a good choice > - # build delta will reuse the cache > - yield (cachedelta[0],) > - tested.add(cachedelta[0]) > - > # This condition is true most of the time when processing > # changegroup data into a generaldelta repo. The only time it > # isn't true is if this is the first revision in a delta chain >
On 10/09/2018 18:52, Gregory Szorc wrote: > On Sat, Sep 8, 2018 at 3:57 AM Boris Feld <boris.feld@octobus.net > <mailto:boris.feld@octobus.net>> wrote: > > # HG changeset patch > # User Boris Feld <boris.feld@octobus.net > <mailto:boris.feld@octobus.net>> > # Date 1536087921 -7200 > # Tue Sep 04 21:05:21 2018 +0200 > # Node ID bdcfd96d87254a508e85a6eb502ffe4c57845bc8 > # Parent 6268fed317d04dd8f0430468818ea5d0529b6503 > # EXP-Topic sparse-snapshot > # Available At https://bitbucket.org/octobus/mercurial-devel/ > # hg pull > https://bitbucket.org/octobus/mercurial-devel/ -r bdcfd96d8725 > revlog: drop duplicated code > > > Queued this series. > > Thank you for the detailed explanations in the commit messages and the > inline comments. It really helps to grok how all of this works. > > I find this work around sparse revlogs and more intelligent delta > chains *very* interesting. And the results obviously speak for themselves. So… we discovered an embarrassing mistake introduced in 37957e07138c. Two lines got swapped and the delta algorithm was silently dropping investigation when an empty delta was encountered. This created significantly worse revlog. All the numbers in the last commit in this series have been affected. This includes the number from the new strategy, however to a lesser extent as it was able to work around the issue the reasonably well. We just gathered new numbers, and the intermediate-snapshot strategy still holds up to its promise. In particular, the new approach resists delta chain constraints well. It is still a choice worth considering in all cases. However it is no longer a pure win on all fronts, so we might have to decide for a trade-off between speed boosts from the delta chain and repository size (maybe slightly longer chain, we'll have to gather performance number). Full detailed numbers at the end of this emails. > > It's worth noting that when we have alternate storage backends that > may not be based on append-only storage and linear access models, more > complex delta strategies like this make a *lot* more sense and > assuming the performance impact is reasonable, should obviously be the > preferred mechanism. > > I'm also a fan of limiting the max chain length: setting an upper > bound on the chain length starts to set an upper bound on the maximum > work that needs to be performed at read/write time. While not > addressed by this series, another potential area for improvement is > setting a bounded upper limit on the total size of stored deltas: > since decompression and patch application is somewhat linearly > proportional to the input data size, bounding the total size of data > that needs to be decoded in order to resolve a revision fulltext would > potentially be a *better* bounding mechanism than chain length itself. > I'll be honest, I'm not sure what the heuristic would be to determine > what that bounded ceiling should be. IMO it is worth investigating though. I'm not sure what you mean here? Bounding the size of stored deltas necessary to restore a revision is something we already do. This has been the first constraint to delta chain when revlog was added. Do you mean something else? > > Regarding performance, there are a handful of quadratic algorithms in > this code. We have several calls to deltaparent() both directly and > indirectly. I feel like this will add significant overhead when adding > multiple revisions to large revlogs (>100,000 revisions). What I'm not > sure about is how this overhead compares to the overhead of resolving > parent revisions and computing deltas against them. I expect revision > resolution and delta computation to take most of the CPU. But at the > same time, quadratic algorithms coupled with Python overhead can make > things very slow. I look forward to the follow-up series addressing > performance. For the full conversion, we seem to have a CPU increase that varies but stay roughly under 50%, but it needs better analysis. An important point is that the search for an appropriate intermediate snapshot base only happens if the parent where unsuitable bases. Since the main trigger for new base is now the delta chain length, this happens about 1 in 1000 revisions. In addition, the large read performance boost we get from the delta chain has a large impact on the time we spend restoring candidate base for diffing, speeding some part of the process. Full statistic of conversion after fixing the bug from 37957e07138c: mercurial Manifest Size: limit | none | 1000 ------------|-------------|------------ no-sparse | 6 143 044 | 6 269 496 no-snapshot | 6 078 442 | 6 174 110 snapshot | 5 798 796 | 5 827 025 Manifest Chain length data limit || none || 1000 value || average | max || average | max ------------||---------|---------||---------|--------- no-sparse || 429 | 1 397 || 397 | 1 000 no-snapshot || 430 | 1 397 || 398 | 1 000 snapshot || 326 | 1 290 || 313 | 1 000 Full Store Size limit | none | 1000 ------------|-------------|------------ no-sparse | 46 944 775 | 47 166 129 no-snapshot | 46 881 164 | 47 071 734 snapshot | 46 622 445 | 46 723 774 pypy Manifest Size: limit | none | 1000 ------------|-------------|------------ no-sparse | 52 941 760 | 56 200 970 no-snapshot | 28 934 814 | 34 410 548 snapshot | 26 348 229 | 27 384 133 Manifest Chain length data limit || none || 1000 value || average | max || average | max ------------||---------|---------||---------|--------- no-sparse || 769 | 3 889 || 390 | 1 000 no-snapshot || 1 379 | 3 889 || 501 | 1 000 snapshot || 1 223 | 3 846 || 495 | 1 000 Full Store Size limit | none | 1000 ------------|-------------|------------ no-sparse | 336 050 203 | 339 309 413 no-snapshot | 312 193 772 | 317 669 506 snapshot | 338 673 985 | 339 709 889 Mozilla Manifest Size: limit | none | 1000 ------------|----------------|--------------- no-sparse | 215 096 339 | 1 708 853 525 no-snapshot | 199 492 686 | 1 590 880 707 snapshot | 188 947 271 | 278 894 170 Manifest Chain length data limit || none || 1000 value || average | max || average | max ------------||---------|---------||---------|-------- no-sparse || 20 454 | 59 562 || 491 | 1 000 no-snapshot || 22 422 | 63 137 || 489 | 1 000 snapshot || 23 509 | 69 891 || 489 | 1 000 Full Store Size limit | none | 1000 ------------|----------------|--------------- no-sparse | 2 377 578 715 | 3 876 258 798 no-snapshot | 2 364 043 550 | 3 760 354 468 snapshot | 2 441 677 137 | 2 535 997 381 Netbeans Manifest Size: limit | none | 1000 ------------|----------------|--------------- no-sparse | 130 088 982 | 741 590 565 no-snapshot | 119 337 636 | 695 443 777 snapshot | 118 836 887 | 159 161 207 Manifest Chain length data limit || none || 1000 value || average | max || average | max ------------||---------|---------||---------|--------- no-sparse || 19 321 | 61 397 || 510 | 1 000 no-snapshot || 21 572 | 61 397 || 510 | 1 000 snapshot || 21 240 | 61 583 || 503 | 1 000 Full Store Size limit | none | 1000 ------------|----------------|--------------- no-sparse | 1 160 013 008 | 1 771 514 591 no-snapshot | 1 149 360 447 | 1 725 466 588 snapshot | 1 164 959 988 | 1 205 284 308 Private repo #1 Manifest Size: limit | none | 1000 ------------|-----------------|--------------- no-sparse | 33 725 285 081 | 33 724 834 190 no-snapshot | 390 302 545 | 1 891 428 641 snapshot | 350 542 420 | 423 470 579 Manifest Chain length data limit || none || 1000 value || average | max || average | max ------------||---------|---------||---------|--------- no-sparse || | || | 1 000 no-snapshot || | || | 1 000 snapshot || | || | 1 000 Full Store Size limit | none | 1000 ------------|----------------|--------------- no-sparse | 41544149652 | 41 543 698 761 no-snapshot | 8 152 000 440 | 9 653 126 536 snapshot | 8 448 037 300 | 8 520 965 459
There was some missing numbers near the end, here there are: On 13/09/2018 16:20, Boris FELD wrote: > On 10/09/2018 18:52, Gregory Szorc wrote: >> On Sat, Sep 8, 2018 at 3:57 AM Boris Feld <boris.feld@octobus.net >> <mailto:boris.feld@octobus.net>> wrote: >> >> # HG changeset patch >> # User Boris Feld <boris.feld@octobus.net >> <mailto:boris.feld@octobus.net>> >> # Date 1536087921 -7200 >> # Tue Sep 04 21:05:21 2018 +0200 >> # Node ID bdcfd96d87254a508e85a6eb502ffe4c57845bc8 >> # Parent 6268fed317d04dd8f0430468818ea5d0529b6503 >> # EXP-Topic sparse-snapshot >> # Available At https://bitbucket.org/octobus/mercurial-devel/ >> # hg pull >> https://bitbucket.org/octobus/mercurial-devel/ -r bdcfd96d8725 >> revlog: drop duplicated code >> >> >> Queued this series. >> >> Thank you for the detailed explanations in the commit messages and >> the inline comments. It really helps to grok how all of this works. >> >> I find this work around sparse revlogs and more intelligent delta >> chains *very* interesting. And the results obviously speak for >> themselves. > > So… we discovered an embarrassing mistake introduced in 37957e07138c. > Two lines got swapped and the delta algorithm was silently dropping > investigation when an empty delta was encountered. This created > significantly worse revlog. > > All the numbers in the last commit in this series have been affected. > This includes the number from the new strategy, however to a lesser > extent as it was able to work around the issue the reasonably well. > > We just gathered new numbers, and the intermediate-snapshot strategy > still holds up to its promise. In particular, the new approach resists > delta chain constraints well. It is still a choice worth considering > in all cases. However it is no longer a pure win on all fronts, so we > might have to decide for a trade-off between speed boosts from the > delta chain and repository size (maybe slightly longer chain, we'll > have to gather performance number). > > Full detailed numbers at the end of this emails. >> >> It's worth noting that when we have alternate storage backends that >> may not be based on append-only storage and linear access models, >> more complex delta strategies like this make a *lot* more sense and >> assuming the performance impact is reasonable, should obviously be >> the preferred mechanism. >> >> I'm also a fan of limiting the max chain length: setting an upper >> bound on the chain length starts to set an upper bound on the maximum >> work that needs to be performed at read/write time. While not >> addressed by this series, another potential area for improvement is >> setting a bounded upper limit on the total size of stored deltas: >> since decompression and patch application is somewhat linearly >> proportional to the input data size, bounding the total size of data >> that needs to be decoded in order to resolve a revision fulltext >> would potentially be a *better* bounding mechanism than chain length >> itself. I'll be honest, I'm not sure what the heuristic would be to >> determine what that bounded ceiling should be. IMO it is worth >> investigating though. > I'm not sure what you mean here? Bounding the size of stored deltas > necessary to restore a revision is something we already do. This has > been the first constraint to delta chain when revlog was added. Do you > mean something else? >> >> Regarding performance, there are a handful of quadratic algorithms in >> this code. We have several calls to deltaparent() both directly and >> indirectly. I feel like this will add significant overhead when >> adding multiple revisions to large revlogs (>100,000 revisions). What >> I'm not sure about is how this overhead compares to the overhead of >> resolving parent revisions and computing deltas against them. I >> expect revision resolution and delta computation to take most of the >> CPU. But at the same time, quadratic algorithms coupled with Python >> overhead can make things very slow. I look forward to the follow-up >> series addressing performance. > > For the full conversion, we seem to have a CPU increase that varies > but stay roughly under 50%, but it needs better analysis. An important > point is that the search for an appropriate intermediate snapshot base > only happens if the parent where unsuitable bases. Since the main > trigger for new base is now the delta chain length, this happens about > 1 in 1000 revisions. In addition, the large read performance boost we > get from the delta chain has a large impact on the time we spend > restoring candidate base for diffing, speeding some part of the process. > > > Full statistic of conversion after fixing the bug from 37957e07138c: > > mercurial > > Manifest Size: > limit | none | 1000 > ------------|-------------|------------ > no-sparse | 6 143 044 | 6 269 496 > no-snapshot | 6 078 442 | 6 174 110 > snapshot | 5 798 796 | 5 827 025 > > Manifest Chain length data > limit || none || 1000 > value || average | max || average | max > ------------||---------|---------||---------|--------- > no-sparse || 429 | 1 397 || 397 | 1 000 > no-snapshot || 430 | 1 397 || 398 | 1 000 > snapshot || 326 | 1 290 || 313 | 1 000 > > Full Store Size > limit | none | 1000 > ------------|-------------|------------ > no-sparse | 46 944 775 | 47 166 129 > no-snapshot | 46 881 164 | 47 071 734 > snapshot | 46 622 445 | 46 723 774 > > pypy > > Manifest Size: > limit | none | 1000 > ------------|-------------|------------ > no-sparse | 52 941 760 | 56 200 970 > no-snapshot | 28 934 814 | 34 410 548 > snapshot | 26 348 229 | 27 384 133 > > Manifest Chain length data > limit || none || 1000 > value || average | max || average | max > ------------||---------|---------||---------|--------- > no-sparse || 769 | 3 889 || 390 | 1 000 > no-snapshot || 1 379 | 3 889 || 501 | 1 000 > snapshot || 1 223 | 3 846 || 495 | 1 000 > > Full Store Size > limit | none | 1000 > ------------|-------------|------------ > no-sparse | 336 050 203 | 339 309 413 > no-snapshot | 312 193 772 | 317 669 506 > snapshot | 338 673 985 | 339 709 889 > > Mozilla > > Manifest Size: > limit | none | 1000 > ------------|----------------|--------------- > no-sparse | 215 096 339 | 1 708 853 525 > no-snapshot | 199 492 686 | 1 590 880 707 > snapshot | 188 947 271 | 278 894 170 > > Manifest Chain length data > limit || none || 1000 > value || average | max || average | max > ------------||---------|---------||---------|-------- > no-sparse || 20 454 | 59 562 || 491 | 1 000 > no-snapshot || 22 422 | 63 137 || 489 | 1 000 > snapshot || 23 509 | 69 891 || 489 | 1 000 > > Full Store Size > limit | none | 1000 > ------------|----------------|--------------- > no-sparse | 2 377 578 715 | 3 876 258 798 > no-snapshot | 2 364 043 550 | 3 760 354 468 > snapshot | 2 441 677 137 | 2 535 997 381 > > Netbeans > > Manifest Size: > limit | none | 1000 > ------------|----------------|--------------- > no-sparse | 130 088 982 | 741 590 565 > no-snapshot | 119 337 636 | 695 443 777 > snapshot | 118 836 887 | 159 161 207 > > Manifest Chain length data > limit || none || 1000 > value || average | max || average | max > ------------||---------|---------||---------|--------- > no-sparse || 19 321 | 61 397 || 510 | 1 000 > no-snapshot || 21 572 | 61 397 || 510 | 1 000 > snapshot || 21 240 | 61 583 || 503 | 1 000 > > Full Store Size > limit | none | 1000 > ------------|----------------|--------------- > no-sparse | 1 160 013 008 | 1 771 514 591 > no-snapshot | 1 149 360 447 | 1 725 466 588 > snapshot | 1 164 959 988 | 1 205 284 308 > > Private repo #1 > > Manifest Size: > limit | none | 1000 > ------------|-----------------|--------------- > no-sparse | 33 725 285 081 | 33 724 834 190 > no-snapshot | 390 302 545 | 1 891 428 641 > snapshot | 350 542 420 | 423 470 579 > > Manifest Chain length data limit || none || 1000 value || average | max || average | max ------------||---------|---------||---------|--------- no-sparse || 282 | 8 885 || 113 | 1 000 no-snapshot || 3 475 | 8 885 || 533 | 1 000 snapshot || 3 655 | 8 951 || 530 | 1 000 > > Full Store Size > limit | none | 1000 > ------------|----------------|--------------- > no-sparse | 41544149652 | 41 543 698 761 > no-snapshot | 8 152 000 440 | 9 653 126 536 > snapshot | 8 448 037 300 | 8 520 965 459 > > > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Patch
diff --git a/mercurial/revlogutils/deltas.py b/mercurial/revlogutils/deltas.py --- a/mercurial/revlogutils/deltas.py +++ b/mercurial/revlogutils/deltas.py @@ -621,19 +621,6 @@ def _rawgroups(revlog, p1, p2, cachedelt curr = len(revlog) prev = curr - 1 - # should we try to build a delta? - if prev != nullrev and revlog._storedeltachains: - tested = set() - # This condition is true most of the time when processing - # changegroup data into a generaldelta repo. The only time it - # isn't true is if this is the first revision in a delta chain - # or if ``format.generaldelta=true`` disabled ``lazydeltabase``. - if cachedelta and gdelta and revlog._lazydeltabase: - # Assume what we received from the server is a good choice - # build delta will reuse the cache - yield (cachedelta[0],) - tested.add(cachedelta[0]) - # This condition is true most of the time when processing # changegroup data into a generaldelta repo. The only time it # isn't true is if this is the first revision in a delta chain