Patchwork discovery: be more conservative when adjusting the sample size

login
register
mail settings
Submitter Pierre-Yves David
Date June 5, 2019, 10:36 a.m.
Message ID <2c67430451fafcdd6877.1559730963@nodosa.octopoid.net>
Download mbox | patch
Permalink /patch/40320/
State Superseded
Headers show

Comments

Pierre-Yves David - June 5, 2019, 10:36 a.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@octobus.net>
# Date 1559726605 -7200
#      Wed Jun 05 11:23:25 2019 +0200
# Node ID 2c67430451fafcdd68770436c520e2d008428986
# Parent  12793787439538411013edffe0f9b98762d38a37
# EXP-Topic discovery-large-undecided
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 2c67430451fa
discovery: be more conservative when adjusting the sample size

Since 5b34972a0094, the discovery will increase the sample size when it detect a
"complex" undecided set. However this detection focussed on the number of roots
only, this could regress discovery performance when the undecided set has many
roots that eventually get merged into a few heads.

To prevent such misbehavior, we adjust the logic to take in account both heads
and roots. The sample size will be increased only if both are especially large.

Performance testing on the same case as 5b34972a0094, does not show a
significant difference.
Yuya Nishihara - June 5, 2019, 12:37 p.m.
On Wed, 05 Jun 2019 12:36:03 +0200, Pierre-Yves David wrote:
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@octobus.net>
> # Date 1559726605 -7200
> #      Wed Jun 05 11:23:25 2019 +0200
> # Node ID 2c67430451fafcdd68770436c520e2d008428986
> # Parent  12793787439538411013edffe0f9b98762d38a37
> # EXP-Topic discovery-large-undecided
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 2c67430451fa
> discovery: be more conservative when adjusting the sample size
> 
> Since 5b34972a0094, the discovery will increase the sample size when it detect a
> "complex" undecided set. However this detection focussed on the number of roots
> only, this could regress discovery performance when the undecided set has many
> roots that eventually get merged into a few heads.
> 
> To prevent such misbehavior, we adjust the logic to take in account both heads
> and roots. The sample size will be increased only if both are especially large.
> 
> Performance testing on the same case as 5b34972a0094, does not show a
> significant difference.
> 
> diff --git a/mercurial/setdiscovery.py b/mercurial/setdiscovery.py
> --- a/mercurial/setdiscovery.py
> +++ b/mercurial/setdiscovery.py
> @@ -236,14 +236,19 @@ class partialdiscovery(object):
>          sample = set(repo.revs('heads(%ld)', revs))
>          parentrevs = self._parentsgetter()
>  
> +        ### update from both directions
> +        revsroots = set(repo.revs('roots(%ld)', revs))
> +        nbroots = len(revsroots)
> +        revsheads = sample.copy()
> +        nbheads = len(revsheads)
> +
> +        if not self._respectsize:
> +            size = max(size, min(nbroots, nbheads))
> +
>          # update from heads
> -        revsheads = sample.copy()
>          _updatesample(revs, revsheads, sample, parentrevs)
>  
>          # update from roots
> -        revsroots = set(repo.revs('roots(%ld)', revs))
> -        if not self._respectsize:
> -            size = max(size, len(revsroots))

Looks good, but any reason to move these codes around? IIUC, _updatesample()
never mutates revsheads.
via Mercurial-devel - June 5, 2019, 1:21 p.m.
On Wed, Jun 5, 2019, 03:36 Pierre-Yves David <pierre-yves.david@ens-lyon.org>
wrote:

> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@octobus.net>
> # Date 1559726605 -7200
> #      Wed Jun 05 11:23:25 2019 +0200
> # Node ID 2c67430451fafcdd68770436c520e2d008428986
> # Parent  12793787439538411013edffe0f9b98762d38a37
> # EXP-Topic discovery-large-undecided
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
> 2c67430451fa
> discovery: be more conservative when adjusting the sample size
>
> Since 5b34972a0094, the discovery will increase the sample size when it
> detect a
> "complex" undecided set. However this detection focussed on the number of
> roots
> only, this could regress discovery performance when the undecided set has
> many
> roots that eventually get merged into a few heads.
>

Seems unlikely to happen in practice. That's why I didn't bother with it in
my patch.

Also, the performance regression you're talking about is on slow networks,
right? I.e., it's not that more round trips or more CPU would be required;
the concern is rather that it takes time to transfer the nodeids on slow
networks, right?

So if you have some CI system that merges 100k heads into one in order to
tests all proposed changes together, and it's on slow network, it would
benefit from this patch?


> To prevent such misbehavior, we adjust the logic to take in account both
> heads
> and roots. The sample size will be increased only if both are especially
> large.
>
> Performance testing on the same case as 5b34972a0094, does not show a
> significant difference.
>
> diff --git a/mercurial/setdiscovery.py b/mercurial/setdiscovery.py
> --- a/mercurial/setdiscovery.py
> +++ b/mercurial/setdiscovery.py
> @@ -236,14 +236,19 @@ class partialdiscovery(object):
>          sample = set(repo.revs('heads(%ld)', revs))
>          parentrevs = self._parentsgetter()
>
> +        ### update from both directions
> +        revsroots = set(repo.revs('roots(%ld)', revs))
> +        nbroots = len(revsroots)
> +        revsheads = sample.copy()
> +        nbheads = len(revsheads)
> +
> +        if not self._respectsize:
> +            size = max(size, min(nbroots, nbheads))
> +
>          # update from heads
> -        revsheads = sample.copy()
>          _updatesample(revs, revsheads, sample, parentrevs)
>
>          # update from roots
> -        revsroots = set(repo.revs('roots(%ld)', revs))
> -        if not self._respectsize:
> -            size = max(size, len(revsroots))
>
>          childrenrevs = self._childrengetter()
>
>
Pierre-Yves David - June 5, 2019, 3:02 p.m.
On 6/5/19 2:37 PM, Yuya Nishihara wrote:
> On Wed, 05 Jun 2019 12:36:03 +0200, Pierre-Yves David wrote:
>> # HG changeset patch
>> # User Pierre-Yves David <pierre-yves.david@octobus.net>
>> # Date 1559726605 -7200
>> #      Wed Jun 05 11:23:25 2019 +0200
>> # Node ID 2c67430451fafcdd68770436c520e2d008428986
>> # Parent  12793787439538411013edffe0f9b98762d38a37
>> # EXP-Topic discovery-large-undecided
>> # Available At https://bitbucket.org/octobus/mercurial-devel/
>> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 2c67430451fa
>> discovery: be more conservative when adjusting the sample size
>>
>> Since 5b34972a0094, the discovery will increase the sample size when it detect a
>> "complex" undecided set. However this detection focussed on the number of roots
>> only, this could regress discovery performance when the undecided set has many
>> roots that eventually get merged into a few heads.
>>
>> To prevent such misbehavior, we adjust the logic to take in account both heads
>> and roots. The sample size will be increased only if both are especially large.
>>
>> Performance testing on the same case as 5b34972a0094, does not show a
>> significant difference.
>>
>> diff --git a/mercurial/setdiscovery.py b/mercurial/setdiscovery.py
>> --- a/mercurial/setdiscovery.py
>> +++ b/mercurial/setdiscovery.py
>> @@ -236,14 +236,19 @@ class partialdiscovery(object):
>>           sample = set(repo.revs('heads(%ld)', revs))
>>           parentrevs = self._parentsgetter()
>>   
>> +        ### update from both directions
>> +        revsroots = set(repo.revs('roots(%ld)', revs))
>> +        nbroots = len(revsroots)
>> +        revsheads = sample.copy()
>> +        nbheads = len(revsheads)
>> +
>> +        if not self._respectsize:
>> +            size = max(size, min(nbroots, nbheads))
>> +
>>           # update from heads
>> -        revsheads = sample.copy()
>>           _updatesample(revs, revsheads, sample, parentrevs)
>>   
>>           # update from roots
>> -        revsroots = set(repo.revs('roots(%ld)', revs))
>> -        if not self._respectsize:
>> -            size = max(size, len(revsroots))
> 
> Looks good, but any reason to move these codes around? IIUC, _updatesample()
> never mutates revsheads.

The update to size need to happen before both `_updatesample(…,revheads, 
…)` and `_updatesample(…,revroots, …)` so the code has to move higher. 
For consistency I am gathering them all in one place.

Does this answer your question ?
Pierre-Yves David - June 5, 2019, 3:16 p.m.
On 6/5/19 3:21 PM, Martin von Zweigbergk wrote:
> 
> 
> On Wed, Jun 5, 2019, 03:36 Pierre-Yves David 
> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>> 
> wrote:
> 
>     # HG changeset patch
>     # User Pierre-Yves David <pierre-yves.david@octobus.net
>     <mailto:pierre-yves.david@octobus.net>>
>     # Date 1559726605 -7200
>     #      Wed Jun 05 11:23:25 2019 +0200
>     # Node ID 2c67430451fafcdd68770436c520e2d008428986
>     # Parent  12793787439538411013edffe0f9b98762d38a37
>     # EXP-Topic discovery-large-undecided
>     # Available At https://bitbucket.org/octobus/mercurial-devel/
>     #              hg pull
>     https://bitbucket.org/octobus/mercurial-devel/ -r 2c67430451fa
>     discovery: be more conservative when adjusting the sample size
> 
>     Since 5b34972a0094, the discovery will increase the sample size when
>     it detect a
>     "complex" undecided set. However this detection focussed on the
>     number of roots
>     only, this could regress discovery performance when the undecided
>     set has many
>     roots that eventually get merged into a few heads.
> 
> 
> Seems unlikely to happen in practice. That's why I didn't bother with it 
> in my patch.

There are very common use case where this happens (eg: many feature 
branch eventually merged into one head for integration testing).

> Also, the performance regression you're talking about is on slow 
> networks, right? I.e., it's not that more round trips or more CPU would 
> be required; the concern is rather that it takes time to transfer the 
> nodeids on slow networks, right?

I am concern about suddenly sending much more information in case where 
we did not before and it would not help the discovery. Both in term of 
bandwidth and processing on the server side.

> So if you have some CI system that merges 100k heads into one in order 
> to tests all proposed changes together, and it's on slow network, it 
> would benefit from this patch?

The idea behind this patch is to adjust the code to be more in the 
spirit of the initial patch. If we want to try and cheaply adjust 
behavior for situation where there are probably many disconnected set 
while keeping the existing behavior for large connected set.
This patch make a small change to better detect case of large connected 
set. If there are few heads or few roots there cannot be many 
disconnected set.
Yuya Nishihara - June 5, 2019, 11:33 p.m.
On Wed, 5 Jun 2019 17:02:36 +0200, Pierre-Yves David wrote:
> 
> 
> On 6/5/19 2:37 PM, Yuya Nishihara wrote:
> > On Wed, 05 Jun 2019 12:36:03 +0200, Pierre-Yves David wrote:
> >> # HG changeset patch
> >> # User Pierre-Yves David <pierre-yves.david@octobus.net>
> >> # Date 1559726605 -7200
> >> #      Wed Jun 05 11:23:25 2019 +0200
> >> # Node ID 2c67430451fafcdd68770436c520e2d008428986
> >> # Parent  12793787439538411013edffe0f9b98762d38a37
> >> # EXP-Topic discovery-large-undecided
> >> # Available At https://bitbucket.org/octobus/mercurial-devel/
> >> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 2c67430451fa
> >> discovery: be more conservative when adjusting the sample size
> >>
> >> Since 5b34972a0094, the discovery will increase the sample size when it detect a
> >> "complex" undecided set. However this detection focussed on the number of roots
> >> only, this could regress discovery performance when the undecided set has many
> >> roots that eventually get merged into a few heads.
> >>
> >> To prevent such misbehavior, we adjust the logic to take in account both heads
> >> and roots. The sample size will be increased only if both are especially large.
> >>
> >> Performance testing on the same case as 5b34972a0094, does not show a
> >> significant difference.
> >>
> >> diff --git a/mercurial/setdiscovery.py b/mercurial/setdiscovery.py
> >> --- a/mercurial/setdiscovery.py
> >> +++ b/mercurial/setdiscovery.py
> >> @@ -236,14 +236,19 @@ class partialdiscovery(object):
> >>           sample = set(repo.revs('heads(%ld)', revs))
> >>           parentrevs = self._parentsgetter()
> >>   
> >> +        ### update from both directions
> >> +        revsroots = set(repo.revs('roots(%ld)', revs))
> >> +        nbroots = len(revsroots)
> >> +        revsheads = sample.copy()
> >> +        nbheads = len(revsheads)
> >> +
> >> +        if not self._respectsize:
> >> +            size = max(size, min(nbroots, nbheads))
> >> +
> >>           # update from heads
> >> -        revsheads = sample.copy()
> >>           _updatesample(revs, revsheads, sample, parentrevs)
> >>   
> >>           # update from roots
> >> -        revsroots = set(repo.revs('roots(%ld)', revs))
> >> -        if not self._respectsize:
> >> -            size = max(size, len(revsroots))
> > 
> > Looks good, but any reason to move these codes around? IIUC, _updatesample()
> > never mutates revsheads.
> 
> The update to size need to happen before both `_updatesample(…,revheads, 
> …)` and `_updatesample(…,revroots, …)` so the code has to move higher. 
> For consistency I am gathering them all in one place.
> 
> Does this answer your question ?

Not really. The size isn't used until _limitsample(). Anyway, that isn't
important, but I felt some orphaned comment looked weird.
via Mercurial-devel - June 6, 2019, 5:40 a.m.
On Wed, Jun 5, 2019 at 8:16 AM Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 6/5/19 3:21 PM, Martin von Zweigbergk wrote:
> >
> >
> > On Wed, Jun 5, 2019, 03:36 Pierre-Yves David
> > <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
>
> > wrote:
> >
> >     # HG changeset patch
> >     # User Pierre-Yves David <pierre-yves.david@octobus.net
> >     <mailto:pierre-yves.david@octobus.net>>
> >     # Date 1559726605 -7200
> >     #      Wed Jun 05 11:23:25 2019 +0200
> >     # Node ID 2c67430451fafcdd68770436c520e2d008428986
> >     # Parent  12793787439538411013edffe0f9b98762d38a37
> >     # EXP-Topic discovery-large-undecided
> >     # Available At https://bitbucket.org/octobus/mercurial-devel/
> >     #              hg pull
> >     https://bitbucket.org/octobus/mercurial-devel/ -r 2c67430451fa
> >     discovery: be more conservative when adjusting the sample size
> >
> >     Since 5b34972a0094, the discovery will increase the sample size when
> >     it detect a
> >     "complex" undecided set. However this detection focussed on the
> >     number of roots
> >     only, this could regress discovery performance when the undecided
> >     set has many
> >     roots that eventually get merged into a few heads.
> >
> >
> > Seems unlikely to happen in practice. That's why I didn't bother with it
> > in my patch.
>
> There are very common use case where this happens (eg: many feature
> branch eventually merged into one head for integration testing).
>

Right, that's the example I gave. I didn't mean that it seems unlikely that
there are fewer heads than roots, only that it seems unlikely that there's
a significant difference between them.


> > Also, the performance regression you're talking about is on slow
> > networks, right? I.e., it's not that more round trips or more CPU would
> > be required; the concern is rather that it takes time to transfer the
> > nodeids on slow networks, right?
>
> I am concern about suddenly sending much more information in case where
> we did not before and it would not help the discovery. Both in term of
> bandwidth and processing on the server side.
>
> > So if you have some CI system that merges 100k heads into one in order
> > to tests all proposed changes together, and it's on slow network, it
> > would benefit from this patch?
>
> The idea behind this patch is to adjust the code to be more in the
> spirit of the initial patch. If we want to try and cheaply adjust
> behavior for situation where there are probably many disconnected set
> while keeping the existing behavior for large connected set.
> This patch make a small change to better detect case of large connected
> set. If there are few heads or few roots there cannot be many
> disconnected set.
>
> --
> Pierre-Yves David
>
Pierre-Yves David - June 6, 2019, 8:41 a.m.
On 6/6/19 1:33 AM, Yuya Nishihara wrote:
> On Wed, 5 Jun 2019 17:02:36 +0200, Pierre-Yves David wrote:
>>
>>
>> On 6/5/19 2:37 PM, Yuya Nishihara wrote:
>>> On Wed, 05 Jun 2019 12:36:03 +0200, Pierre-Yves David wrote:
>>>> # HG changeset patch
>>>> # User Pierre-Yves David <pierre-yves.david@octobus.net>
>>>> # Date 1559726605 -7200
>>>> #      Wed Jun 05 11:23:25 2019 +0200
>>>> # Node ID 2c67430451fafcdd68770436c520e2d008428986
>>>> # Parent  12793787439538411013edffe0f9b98762d38a37
>>>> # EXP-Topic discovery-large-undecided
>>>> # Available At https://bitbucket.org/octobus/mercurial-devel/
>>>> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 2c67430451fa
>>>> discovery: be more conservative when adjusting the sample size
>>>>
>>>> Since 5b34972a0094, the discovery will increase the sample size when it detect a
>>>> "complex" undecided set. However this detection focussed on the number of roots
>>>> only, this could regress discovery performance when the undecided set has many
>>>> roots that eventually get merged into a few heads.
>>>>
>>>> To prevent such misbehavior, we adjust the logic to take in account both heads
>>>> and roots. The sample size will be increased only if both are especially large.
>>>>
>>>> Performance testing on the same case as 5b34972a0094, does not show a
>>>> significant difference.
>>>>
>>>> diff --git a/mercurial/setdiscovery.py b/mercurial/setdiscovery.py
>>>> --- a/mercurial/setdiscovery.py
>>>> +++ b/mercurial/setdiscovery.py
>>>> @@ -236,14 +236,19 @@ class partialdiscovery(object):
>>>>            sample = set(repo.revs('heads(%ld)', revs))
>>>>            parentrevs = self._parentsgetter()
>>>>    
>>>> +        ### update from both directions
>>>> +        revsroots = set(repo.revs('roots(%ld)', revs))
>>>> +        nbroots = len(revsroots)
>>>> +        revsheads = sample.copy()
>>>> +        nbheads = len(revsheads)
>>>> +
>>>> +        if not self._respectsize:
>>>> +            size = max(size, min(nbroots, nbheads))
>>>> +
>>>>            # update from heads
>>>> -        revsheads = sample.copy()
>>>>            _updatesample(revs, revsheads, sample, parentrevs)
>>>>    
>>>>            # update from roots
>>>> -        revsroots = set(repo.revs('roots(%ld)', revs))
>>>> -        if not self._respectsize:
>>>> -            size = max(size, len(revsroots))
>>>
>>> Looks good, but any reason to move these codes around? IIUC, _updatesample()
>>> never mutates revsheads.
>>
>> The update to size need to happen before both `_updatesample(…,revheads,
>> …)` and `_updatesample(…,revroots, …)` so the code has to move higher.
>> For consistency I am gathering them all in one place.
>>
>> Does this answer your question ?
> 
> Not really. The size isn't used until _limitsample(). Anyway, that isn't
> important, but I felt some orphaned comment looked weird.

Ha! I see your point. V2 coming.
Pierre-Yves David - June 6, 2019, 8:42 a.m.
On 6/6/19 7:40 AM, Martin von Zweigbergk wrote:
> 
> 
> On Wed, Jun 5, 2019 at 8:16 AM Pierre-Yves David 
> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>> 
> wrote:
> 
> 
> 
>     On 6/5/19 3:21 PM, Martin von Zweigbergk wrote:
>      >
>      >
>      > On Wed, Jun 5, 2019, 03:36 Pierre-Yves David
>      > <pierre-yves.david@ens-lyon.org
>     <mailto:pierre-yves.david@ens-lyon.org>
>     <mailto:pierre-yves.david@ens-lyon.org
>     <mailto:pierre-yves.david@ens-lyon.org>>>
>      > wrote:
>      >
>      >     # HG changeset patch
>      >     # User Pierre-Yves David <pierre-yves.david@octobus.net
>     <mailto:pierre-yves.david@octobus.net>
>      >     <mailto:pierre-yves.david@octobus.net
>     <mailto:pierre-yves.david@octobus.net>>>
>      >     # Date 1559726605 -7200
>      >     #      Wed Jun 05 11:23:25 2019 +0200
>      >     # Node ID 2c67430451fafcdd68770436c520e2d008428986
>      >     # Parent  12793787439538411013edffe0f9b98762d38a37
>      >     # EXP-Topic discovery-large-undecided
>      >     # Available At https://bitbucket.org/octobus/mercurial-devel/
>      >     #              hg pull
>      > https://bitbucket.org/octobus/mercurial-devel/ -r 2c67430451fa
>      >     discovery: be more conservative when adjusting the sample size
>      >
>      >     Since 5b34972a0094, the discovery will increase the sample
>     size when
>      >     it detect a
>      >     "complex" undecided set. However this detection focussed on the
>      >     number of roots
>      >     only, this could regress discovery performance when the undecided
>      >     set has many
>      >     roots that eventually get merged into a few heads.
>      >
>      >
>      > Seems unlikely to happen in practice. That's why I didn't bother
>     with it
>      > in my patch.
> 
>     There are very common use case where this happens (eg: many feature
>     branch eventually merged into one head for integration testing).
> 
> 
> Right, that's the example I gave. I didn't mean that it seems unlikely 
> that there are fewer heads than roots, only that it seems unlikely that 
> there's a significant difference between them.

I saw two different cases where this is the case on a regular basis ;-)

Patch

diff --git a/mercurial/setdiscovery.py b/mercurial/setdiscovery.py
--- a/mercurial/setdiscovery.py
+++ b/mercurial/setdiscovery.py
@@ -236,14 +236,19 @@  class partialdiscovery(object):
         sample = set(repo.revs('heads(%ld)', revs))
         parentrevs = self._parentsgetter()
 
+        ### update from both directions
+        revsroots = set(repo.revs('roots(%ld)', revs))
+        nbroots = len(revsroots)
+        revsheads = sample.copy()
+        nbheads = len(revsheads)
+
+        if not self._respectsize:
+            size = max(size, min(nbroots, nbheads))
+
         # update from heads
-        revsheads = sample.copy()
         _updatesample(revs, revsheads, sample, parentrevs)
 
         # update from roots
-        revsroots = set(repo.revs('roots(%ld)', revs))
-        if not self._respectsize:
-            size = max(size, len(revsroots))
 
         childrenrevs = self._childrengetter()