Patchwork discovery: slowly increase sampling size

login
register
mail settings
Submitter Pierre-Yves David
Date May 21, 2019, 11:21 a.m.
Message ID <123267b121ea268b3bc4.1558437717@nodosa.octopoid.net>
Download mbox | patch
Permalink /patch/40164/
State New
Headers show

Comments

Pierre-Yves David - May 21, 2019, 11:21 a.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@octobus.net>
# Date 1558436902 -7200
#      Tue May 21 13:08:22 2019 +0200
# Node ID 123267b121ea268b3bc488c6dde68a8d93ee4f4c
# Parent  86f17fc31aa8a0c26c11db6a532293463ee6428a
# EXP-Topic discovery
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 123267b121ea
discovery: slowly increase sampling size

Some pathological discovery runs can requires many roundtrip. When this happens
things can get very slow.

To make the algorithm more resilience again such pathological case. We slowly
increase the sample size with each roundtrip (+5%). This will have a negligible
impact on "normal" discovery with few roundtrips, but a large positive impact of
case with many roundtrips. Asking more question per roundtrip helps to reduce
the undecided set faster. Instead of reducing the undecided set a linear speed
(in the worst case), we reduce it as a guaranteed (small) exponential rate. The
data below show this slow ramp up in sample size:

round trip    |    1 |     5 |    10 |    20 |     50 |     100 |       130 |
sample size   |  200 |   254 |   321 |   517 |  2 199 |  25 123 |   108 549 |
covered nodes |  200 | 1 357 | 2 821 | 7 031 | 42 658 | 524 530 | 2 276 755 |

To be a bit more concrete, lets take a very pathological case as an example. We
are doing discovery from a copy of Mozilla-try to a more recent version of
mozilla-unified. Mozilla-unified heads are unknown to the mozilla-try repo and
there are over 1 million "missing" changesets. (the discovery is "local" to
avoid network interference)

Without this change, the discovery:
- last 1858 seconds (31 minutes),
- does 1700 round trip,
- asking about 340 000 nodes.

With this change, the discovery:
- last 218 seconds (3 minutes, 38 seconds a -88% improvement),
- does 94 round trip (-94%),
- asking about 344 211 nodes (+1%).

Of course, this is an extreme case (and 3 minutes is still slow). However this
give a good example of how this sample size increase act as a safety net
catching any bad situations.

We could image a steeper increase than 5%. For example 10% would give the
following number:

round trip    |    1 |     5 |    10 |     20 |     50  |        75 |        100 |
sample size   |  200 |   321 |   514 |  1 326 |  23 060 |   249 812 |  2 706 594 |
covered nodes |  200 | 1 541 | 3 690 | 12 671 | 251 871 | 2 746 254 | 29 770 966 |

In parallel, it is useful to understand these pathological cases and improve
them. However the current change provides a general purpose safety net to smooth
the impact of pathological cases.

To avoid issue with older http server, the increase in sample size only occurs
if the protocol has not limit on command argument size.
via Mercurial-devel - May 21, 2019, 4:05 p.m.
On Tue, May 21, 2019 at 4:22 AM Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@octobus.net>
> # Date 1558436902 -7200
> #      Tue May 21 13:08:22 2019 +0200
> # Node ID 123267b121ea268b3bc488c6dde68a8d93ee4f4c
> # Parent  86f17fc31aa8a0c26c11db6a532293463ee6428a
> # EXP-Topic discovery
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
> 123267b121ea
> discovery: slowly increase sampling size
>
> Some pathological discovery runs can requires many roundtrip. When this
> happens
> things can get very slow.
>
> To make the algorithm more resilience again such pathological case. We
> slowly
> increase the sample size with each roundtrip (+5%). This will have a
> negligible
> impact on "normal" discovery with few roundtrips, but a large positive
> impact of
> case with many roundtrips. Asking more question per roundtrip helps to
> reduce
> the undecided set faster. Instead of reducing the undecided set a linear
> speed
> (in the worst case), we reduce it as a guaranteed (small) exponential
> rate. The
> data below show this slow ramp up in sample size:
>
> round trip    |    1 |     5 |    10 |    20 |     50 |     100 |
>  130 |
> sample size   |  200 |   254 |   321 |   517 |  2 199 |  25 123 |   108
> 549 |
> covered nodes |  200 | 1 357 | 2 821 | 7 031 | 42 658 | 524 530 | 2 276
> 755 |
>
> To be a bit more concrete, lets take a very pathological case as an
> example. We
> are doing discovery from a copy of Mozilla-try to a more recent version of
> mozilla-unified. Mozilla-unified heads are unknown to the mozilla-try repo
> and
> there are over 1 million "missing" changesets. (the discovery is "local" to
> avoid network interference)
>

What makes it slow in that case? If the 1 million changesets are mostly
linear, it should still be fast, right? So I assume they're spread out
across branches. Is it slow because the remote side has many branches that
the local side does not, or the other way around?


> Without this change, the discovery:
> - last 1858 seconds (31 minutes),
> - does 1700 round trip,
> - asking about 340 000 nodes.
>
> With this change, the discovery:
> - last 218 seconds (3 minutes, 38 seconds a -88% improvement),
> - does 94 round trip (-94%),
> - asking about 344 211 nodes (+1%).
>
> Of course, this is an extreme case (and 3 minutes is still slow). However
> this
> give a good example of how this sample size increase act as a safety net
> catching any bad situations.
>
> We could image a steeper increase than 5%. For example 10% would give the
> following number:
>
> round trip    |    1 |     5 |    10 |     20 |     50  |        75 |
>   100 |
> sample size   |  200 |   321 |   514 |  1 326 |  23 060 |   249 812 |  2
> 706 594 |
> covered nodes |  200 | 1 541 | 3 690 | 12 671 | 251 871 | 2 746 254 | 29
> 770 966 |
>
> In parallel, it is useful to understand these pathological cases and
> improve
> them. However the current change provides a general purpose safety net to
> smooth
> the impact of pathological cases.
>
> To avoid issue with older http server, the increase in sample size only
> occurs
> if the protocol has not limit on command argument size.
>
> diff --git a/mercurial/setdiscovery.py b/mercurial/setdiscovery.py
> --- a/mercurial/setdiscovery.py
> +++ b/mercurial/setdiscovery.py
> @@ -256,7 +256,8 @@ def findcommonheads(ui, local, remote,
>                      initialsamplesize=100,
>                      fullsamplesize=200,
>                      abortwhenunrelated=True,
> -                    ancestorsof=None):
> +                    ancestorsof=None,
> +                    samplegrowth=1.05):
>      '''Return a tuple (common, anyincoming, remoteheads) used to identify
>      missing nodes from or in remote.
>      '''
> @@ -389,6 +390,8 @@ def findcommonheads(ui, local, remote,
>                  ui.debug("taking initial sample\n")
>              samplefunc = disco.takefullsample
>              targetsize = fullsamplesize
> +            if not remote.limitedarguments:
> +                fullsamplesize = int(fullsamplesize * samplegrowth)
>          else:
>              # use even cheaper initial sample
>              ui.debug("taking quick initial sample\n")
> diff --git a/tests/test-setdiscovery.t b/tests/test-setdiscovery.t
> --- a/tests/test-setdiscovery.t
> +++ b/tests/test-setdiscovery.t
> @@ -980,10 +980,10 @@ One with >200 heads. We now switch to se
>    query 3; still undecided: 980, sample size is: 200
>    sampling from both directions
>    searching: 4 queries
> -  query 4; still undecided: \d+, sample size is: 200 (re)
> +  query 4; still undecided: 435, sample size is: 210
>    sampling from both directions
>    searching: 5 queries
> -  query 5; still undecided: 195, sample size is: 195
> +  query 5; still undecided: 185, sample size is: 185
>    5 total queries in *.????s (glob)
>    elapsed time:  * seconds (glob)
>    heads summary:
>
Pierre-Yves David - May 21, 2019, 5:09 p.m.
On 5/21/19 6:05 PM, Martin von Zweigbergk wrote:
> 
> 
> On Tue, May 21, 2019 at 4:22 AM Pierre-Yves David 
> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>> 
> wrote:
> 
>     # HG changeset patch
>     # User Pierre-Yves David <pierre-yves.david@octobus.net
>     <mailto:pierre-yves.david@octobus.net>>
>     # Date 1558436902 -7200
>     #      Tue May 21 13:08:22 2019 +0200
>     # Node ID 123267b121ea268b3bc488c6dde68a8d93ee4f4c
>     # Parent  86f17fc31aa8a0c26c11db6a532293463ee6428a
>     # EXP-Topic discovery
>     # Available At https://bitbucket.org/octobus/mercurial-devel/
>     #              hg pull
>     https://bitbucket.org/octobus/mercurial-devel/ -r 123267b121ea
>     discovery: slowly increase sampling size
> 
>     Some pathological discovery runs can requires many roundtrip. When
>     this happens
>     things can get very slow.
> 
>     To make the algorithm more resilience again such pathological case.
>     We slowly
>     increase the sample size with each roundtrip (+5%). This will have a
>     negligible
>     impact on "normal" discovery with few roundtrips, but a large
>     positive impact of
>     case with many roundtrips. Asking more question per roundtrip helps
>     to reduce
>     the undecided set faster. Instead of reducing the undecided set a
>     linear speed
>     (in the worst case), we reduce it as a guaranteed (small)
>     exponential rate. The
>     data below show this slow ramp up in sample size:
> 
>     round trip    |    1 |     5 |    10 |    20 |     50 |     100 |   
>         130 |
>     sample size   |  200 |   254 |   321 |   517 |  2 199 |  25 123 | 
>       108 549 |
>     covered nodes |  200 | 1 357 | 2 821 | 7 031 | 42 658 | 524 530 | 2
>     276 755 |
> 
>     To be a bit more concrete, lets take a very pathological case as an
>     example. We
>     are doing discovery from a copy of Mozilla-try to a more recent
>     version of
>     mozilla-unified. Mozilla-unified heads are unknown to the
>     mozilla-try repo and
>     there are over 1 million "missing" changesets. (the discovery is
>     "local" to
>     avoid network interference)
> 
> 
> What makes it slow in that case? If the 1 million changesets are mostly 
> linear, it should still be fast, right? So I assume they're spread out 
> across branches. Is it slow because the remote side has many branches 
> that the local side does not, or the other way around?

The million changeset are on many different branches. (the discovery is 
run from mozilla-try (many changesets, tree like) toward mozilla-unified 
(mostly linear).

(Note that the point of this test is not so much of looking into how to 
make that scenario faster, but to show a bad scenario and show how a 
generic safety net makes its more bearable)

> 
> 
>     Without this change, the discovery:
>     - last 1858 seconds (31 minutes),
>     - does 1700 round trip,
>     - asking about 340 000 nodes.
> 
>     With this change, the discovery:
>     - last 218 seconds (3 minutes, 38 seconds a -88% improvement),
>     - does 94 round trip (-94%),
>     - asking about 344 211 nodes (+1%).
> 
>     Of course, this is an extreme case (and 3 minutes is still slow).
>     However this
>     give a good example of how this sample size increase act as a safety net
>     catching any bad situations.
> 
>     We could image a steeper increase than 5%. For example 10% would
>     give the
>     following number:
> 
>     round trip    |    1 |     5 |    10 |     20 |     50  |        75
>     |        100 |
>     sample size   |  200 |   321 |   514 |  1 326 |  23 060 |   249 812
>     |  2 706 594 |
>     covered nodes |  200 | 1 541 | 3 690 | 12 671 | 251 871 | 2 746 254
>     | 29 770 966 |
> 
>     In parallel, it is useful to understand these pathological cases and
>     improve
>     them. However the current change provides a general purpose safety
>     net to smooth
>     the impact of pathological cases.
> 
>     To avoid issue with older http server, the increase in sample size
>     only occurs
>     if the protocol has not limit on command argument size.
> 
>     diff --git a/mercurial/setdiscovery.py b/mercurial/setdiscovery.py
>     --- a/mercurial/setdiscovery.py
>     +++ b/mercurial/setdiscovery.py
>     @@ -256,7 +256,8 @@ def findcommonheads(ui, local, remote,
>                           initialsamplesize=100,
>                           fullsamplesize=200,
>                           abortwhenunrelated=True,
>     -                    ancestorsof=None):
>     +                    ancestorsof=None,
>     +                    samplegrowth=1.05):
>           '''Return a tuple (common, anyincoming, remoteheads) used to
>     identify
>           missing nodes from or in remote.
>           '''
>     @@ -389,6 +390,8 @@ def findcommonheads(ui, local, remote,
>                       ui.debug("taking initial sample\n")
>                   samplefunc = disco.takefullsample
>                   targetsize = fullsamplesize
>     +            if not remote.limitedarguments:
>     +                fullsamplesize = int(fullsamplesize * samplegrowth)
>               else:
>                   # use even cheaper initial sample
>                   ui.debug("taking quick initial sample\n")
>     diff --git a/tests/test-setdiscovery.t b/tests/test-setdiscovery.t
>     --- a/tests/test-setdiscovery.t
>     +++ b/tests/test-setdiscovery.t
>     @@ -980,10 +980,10 @@ One with >200 heads. We now switch to se
>         query 3; still undecided: 980, sample size is: 200
>         sampling from both directions
>         searching: 4 queries
>     -  query 4; still undecided: \d+, sample size is: 200 (re)
>     +  query 4; still undecided: 435, sample size is: 210
>         sampling from both directions
>         searching: 5 queries
>     -  query 5; still undecided: 195, sample size is: 195
>     +  query 5; still undecided: 185, sample size is: 185
>         5 total queries in *.????s (glob)
>         elapsed time:  * seconds (glob)
>         heads summary:
>
via Mercurial-devel - May 21, 2019, 6:10 p.m.
On Tue, May 21, 2019 at 10:09 AM Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 5/21/19 6:05 PM, Martin von Zweigbergk wrote:
> >
> >
> > On Tue, May 21, 2019 at 4:22 AM Pierre-Yves David
> > <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
>
> > wrote:
> >
> >     # HG changeset patch
> >     # User Pierre-Yves David <pierre-yves.david@octobus.net
> >     <mailto:pierre-yves.david@octobus.net>>
> >     # Date 1558436902 -7200
> >     #      Tue May 21 13:08:22 2019 +0200
> >     # Node ID 123267b121ea268b3bc488c6dde68a8d93ee4f4c
> >     # Parent  86f17fc31aa8a0c26c11db6a532293463ee6428a
> >     # EXP-Topic discovery
> >     # Available At https://bitbucket.org/octobus/mercurial-devel/
> >     #              hg pull
> >     https://bitbucket.org/octobus/mercurial-devel/ -r 123267b121ea
> >     discovery: slowly increase sampling size
> >
> >     Some pathological discovery runs can requires many roundtrip. When
> >     this happens
> >     things can get very slow.
> >
> >     To make the algorithm more resilience again such pathological case.
> >     We slowly
> >     increase the sample size with each roundtrip (+5%). This will have a
> >     negligible
> >     impact on "normal" discovery with few roundtrips, but a large
> >     positive impact of
> >     case with many roundtrips. Asking more question per roundtrip helps
> >     to reduce
> >     the undecided set faster. Instead of reducing the undecided set a
> >     linear speed
> >     (in the worst case), we reduce it as a guaranteed (small)
> >     exponential rate. The
> >     data below show this slow ramp up in sample size:
> >
> >     round trip    |    1 |     5 |    10 |    20 |     50 |     100 |
> >         130 |
> >     sample size   |  200 |   254 |   321 |   517 |  2 199 |  25 123 |
> >       108 549 |
> >     covered nodes |  200 | 1 357 | 2 821 | 7 031 | 42 658 | 524 530 | 2
> >     276 755 |
> >
> >     To be a bit more concrete, lets take a very pathological case as an
> >     example. We
> >     are doing discovery from a copy of Mozilla-try to a more recent
> >     version of
> >     mozilla-unified. Mozilla-unified heads are unknown to the
> >     mozilla-try repo and
> >     there are over 1 million "missing" changesets. (the discovery is
> >     "local" to
> >     avoid network interference)
> >
> >
> > What makes it slow in that case? If the 1 million changesets are mostly
> > linear, it should still be fast, right? So I assume they're spread out
> > across branches. Is it slow because the remote side has many branches
> > that the local side does not, or the other way around?
>
> The million changeset are on many different branches. (the discovery is
> run from mozilla-try (many changesets, tree like) toward mozilla-unified
> (mostly linear).
>

I think that means that there are many local heads. Maybe it's the case I
tried to address with https://phab.mercurial-scm.org/D2647. I just
re-opened that. I'll see if can get timing numbers for the repos you
mentioned here.


>
> (Note that the point of this test is not so much of looking into how to
> make that scenario faster, but to show a bad scenario and show how a
> generic safety net makes its more bearable)
>
> >
> >
> >     Without this change, the discovery:
> >     - last 1858 seconds (31 minutes),
> >     - does 1700 round trip,
> >     - asking about 340 000 nodes.
> >
> >     With this change, the discovery:
> >     - last 218 seconds (3 minutes, 38 seconds a -88% improvement),
> >     - does 94 round trip (-94%),
> >     - asking about 344 211 nodes (+1%).
> >
> >     Of course, this is an extreme case (and 3 minutes is still slow).
> >     However this
> >     give a good example of how this sample size increase act as a safety
> net
> >     catching any bad situations.
> >
> >     We could image a steeper increase than 5%. For example 10% would
> >     give the
> >     following number:
> >
> >     round trip    |    1 |     5 |    10 |     20 |     50  |        75
> >     |        100 |
> >     sample size   |  200 |   321 |   514 |  1 326 |  23 060 |   249 812
> >     |  2 706 594 |
> >     covered nodes |  200 | 1 541 | 3 690 | 12 671 | 251 871 | 2 746 254
> >     | 29 770 966 |
> >
> >     In parallel, it is useful to understand these pathological cases and
> >     improve
> >     them. However the current change provides a general purpose safety
> >     net to smooth
> >     the impact of pathological cases.
> >
> >     To avoid issue with older http server, the increase in sample size
> >     only occurs
> >     if the protocol has not limit on command argument size.
> >
> >     diff --git a/mercurial/setdiscovery.py b/mercurial/setdiscovery.py
> >     --- a/mercurial/setdiscovery.py
> >     +++ b/mercurial/setdiscovery.py
> >     @@ -256,7 +256,8 @@ def findcommonheads(ui, local, remote,
> >                           initialsamplesize=100,
> >                           fullsamplesize=200,
> >                           abortwhenunrelated=True,
> >     -                    ancestorsof=None):
> >     +                    ancestorsof=None,
> >     +                    samplegrowth=1.05):
> >           '''Return a tuple (common, anyincoming, remoteheads) used to
> >     identify
> >           missing nodes from or in remote.
> >           '''
> >     @@ -389,6 +390,8 @@ def findcommonheads(ui, local, remote,
> >                       ui.debug("taking initial sample\n")
> >                   samplefunc = disco.takefullsample
> >                   targetsize = fullsamplesize
> >     +            if not remote.limitedarguments:
> >     +                fullsamplesize = int(fullsamplesize * samplegrowth)
> >               else:
> >                   # use even cheaper initial sample
> >                   ui.debug("taking quick initial sample\n")
> >     diff --git a/tests/test-setdiscovery.t b/tests/test-setdiscovery.t
> >     --- a/tests/test-setdiscovery.t
> >     +++ b/tests/test-setdiscovery.t
> >     @@ -980,10 +980,10 @@ One with >200 heads. We now switch to se
> >         query 3; still undecided: 980, sample size is: 200
> >         sampling from both directions
> >         searching: 4 queries
> >     -  query 4; still undecided: \d+, sample size is: 200 (re)
> >     +  query 4; still undecided: 435, sample size is: 210
> >         sampling from both directions
> >         searching: 5 queries
> >     -  query 5; still undecided: 195, sample size is: 195
> >     +  query 5; still undecided: 185, sample size is: 185
> >         5 total queries in *.????s (glob)
> >         elapsed time:  * seconds (glob)
> >         heads summary:
> >
>
> --
> Pierre-Yves David
>
via Mercurial-devel - May 21, 2019, 8 p.m.
On Tue, May 21, 2019 at 11:10 AM Martin von Zweigbergk <
martinvonz@google.com> wrote:

>
>
> On Tue, May 21, 2019 at 10:09 AM Pierre-Yves David <
> pierre-yves.david@ens-lyon.org> wrote:
>
>>
>>
>> On 5/21/19 6:05 PM, Martin von Zweigbergk wrote:
>> >
>> >
>> > On Tue, May 21, 2019 at 4:22 AM Pierre-Yves David
>> > <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
>>
>> > wrote:
>> >
>> >     # HG changeset patch
>> >     # User Pierre-Yves David <pierre-yves.david@octobus.net
>> >     <mailto:pierre-yves.david@octobus.net>>
>> >     # Date 1558436902 -7200
>> >     #      Tue May 21 13:08:22 2019 +0200
>> >     # Node ID 123267b121ea268b3bc488c6dde68a8d93ee4f4c
>> >     # Parent  86f17fc31aa8a0c26c11db6a532293463ee6428a
>> >     # EXP-Topic discovery
>> >     # Available At https://bitbucket.org/octobus/mercurial-devel/
>> >     #              hg pull
>> >     https://bitbucket.org/octobus/mercurial-devel/ -r 123267b121ea
>> >     discovery: slowly increase sampling size
>> >
>> >     Some pathological discovery runs can requires many roundtrip. When
>> >     this happens
>> >     things can get very slow.
>> >
>> >     To make the algorithm more resilience again such pathological case.
>> >     We slowly
>> >     increase the sample size with each roundtrip (+5%). This will have a
>> >     negligible
>> >     impact on "normal" discovery with few roundtrips, but a large
>> >     positive impact of
>> >     case with many roundtrips. Asking more question per roundtrip helps
>> >     to reduce
>> >     the undecided set faster. Instead of reducing the undecided set a
>> >     linear speed
>> >     (in the worst case), we reduce it as a guaranteed (small)
>> >     exponential rate. The
>> >     data below show this slow ramp up in sample size:
>> >
>> >     round trip    |    1 |     5 |    10 |    20 |     50 |     100 |
>> >         130 |
>> >     sample size   |  200 |   254 |   321 |   517 |  2 199 |  25 123 |
>> >       108 549 |
>> >     covered nodes |  200 | 1 357 | 2 821 | 7 031 | 42 658 | 524 530 | 2
>> >     276 755 |
>> >
>> >     To be a bit more concrete, lets take a very pathological case as an
>> >     example. We
>> >     are doing discovery from a copy of Mozilla-try to a more recent
>> >     version of
>> >     mozilla-unified. Mozilla-unified heads are unknown to the
>> >     mozilla-try repo and
>> >     there are over 1 million "missing" changesets. (the discovery is
>> >     "local" to
>> >     avoid network interference)
>> >
>> >
>> > What makes it slow in that case? If the 1 million changesets are mostly
>> > linear, it should still be fast, right? So I assume they're spread out
>> > across branches. Is it slow because the remote side has many branches
>> > that the local side does not, or the other way around?
>>
>> The million changeset are on many different branches. (the discovery is
>> run from mozilla-try (many changesets, tree like) toward mozilla-unified
>> (mostly linear).
>>
>
> I think that means that there are many local heads. Maybe it's the case I
> tried to address with https://phab.mercurial-scm.org/D2647. I just
> re-opened that. I'll see if can get timing numbers for the repos you
> mentioned here.
>

That's now done. I've added you as reviewer. I feel like that's a better
solution for this particular case. It results in significantly more bytes
transferred (+25%), but it's also much faster.


>
>>
>> (Note that the point of this test is not so much of looking into how to
>> make that scenario faster, but to show a bad scenario and show how a
>> generic safety net makes its more bearable)
>>
>> >
>> >
>> >     Without this change, the discovery:
>> >     - last 1858 seconds (31 minutes),
>> >     - does 1700 round trip,
>> >     - asking about 340 000 nodes.
>> >
>> >     With this change, the discovery:
>> >     - last 218 seconds (3 minutes, 38 seconds a -88% improvement),
>> >     - does 94 round trip (-94%),
>> >     - asking about 344 211 nodes (+1%).
>> >
>> >     Of course, this is an extreme case (and 3 minutes is still slow).
>> >     However this
>> >     give a good example of how this sample size increase act as a
>> safety net
>> >     catching any bad situations.
>> >
>> >     We could image a steeper increase than 5%. For example 10% would
>> >     give the
>> >     following number:
>> >
>> >     round trip    |    1 |     5 |    10 |     20 |     50  |        75
>> >     |        100 |
>> >     sample size   |  200 |   321 |   514 |  1 326 |  23 060 |   249 812
>> >     |  2 706 594 |
>> >     covered nodes |  200 | 1 541 | 3 690 | 12 671 | 251 871 | 2 746 254
>> >     | 29 770 966 |
>> >
>> >     In parallel, it is useful to understand these pathological cases and
>> >     improve
>> >     them. However the current change provides a general purpose safety
>> >     net to smooth
>> >     the impact of pathological cases.
>> >
>> >     To avoid issue with older http server, the increase in sample size
>> >     only occurs
>> >     if the protocol has not limit on command argument size.
>> >
>> >     diff --git a/mercurial/setdiscovery.py b/mercurial/setdiscovery.py
>> >     --- a/mercurial/setdiscovery.py
>> >     +++ b/mercurial/setdiscovery.py
>> >     @@ -256,7 +256,8 @@ def findcommonheads(ui, local, remote,
>> >                           initialsamplesize=100,
>> >                           fullsamplesize=200,
>> >                           abortwhenunrelated=True,
>> >     -                    ancestorsof=None):
>> >     +                    ancestorsof=None,
>> >     +                    samplegrowth=1.05):
>> >           '''Return a tuple (common, anyincoming, remoteheads) used to
>> >     identify
>> >           missing nodes from or in remote.
>> >           '''
>> >     @@ -389,6 +390,8 @@ def findcommonheads(ui, local, remote,
>> >                       ui.debug("taking initial sample\n")
>> >                   samplefunc = disco.takefullsample
>> >                   targetsize = fullsamplesize
>> >     +            if not remote.limitedarguments:
>> >     +                fullsamplesize = int(fullsamplesize * samplegrowth)
>> >               else:
>> >                   # use even cheaper initial sample
>> >                   ui.debug("taking quick initial sample\n")
>> >     diff --git a/tests/test-setdiscovery.t b/tests/test-setdiscovery.t
>> >     --- a/tests/test-setdiscovery.t
>> >     +++ b/tests/test-setdiscovery.t
>> >     @@ -980,10 +980,10 @@ One with >200 heads. We now switch to se
>> >         query 3; still undecided: 980, sample size is: 200
>> >         sampling from both directions
>> >         searching: 4 queries
>> >     -  query 4; still undecided: \d+, sample size is: 200 (re)
>> >     +  query 4; still undecided: 435, sample size is: 210
>> >         sampling from both directions
>> >         searching: 5 queries
>> >     -  query 5; still undecided: 195, sample size is: 195
>> >     +  query 5; still undecided: 185, sample size is: 185
>> >         5 total queries in *.????s (glob)
>> >         elapsed time:  * seconds (glob)
>> >         heads summary:
>> >
>>
>> --
>> Pierre-Yves David
>>
>
Pierre-Yves David - May 21, 2019, 8:24 p.m.
On 5/21/19 10:00 PM, Martin von Zweigbergk wrote:
> 
> 
> On Tue, May 21, 2019 at 11:10 AM Martin von Zweigbergk 
> <martinvonz@google.com <mailto:martinvonz@google.com>> wrote:
> 
> 
> 
>     On Tue, May 21, 2019 at 10:09 AM Pierre-Yves David
>     <pierre-yves.david@ens-lyon.org
>     <mailto:pierre-yves.david@ens-lyon.org>> wrote:
> 
> 
> 
>         On 5/21/19 6:05 PM, Martin von Zweigbergk wrote:
>          >
>          >
>          > On Tue, May 21, 2019 at 4:22 AM Pierre-Yves David
>          > <pierre-yves.david@ens-lyon.org
>         <mailto:pierre-yves.david@ens-lyon.org>
>         <mailto:pierre-yves.david@ens-lyon.org
>         <mailto:pierre-yves.david@ens-lyon.org>>>
>          > wrote:
>          >
>          >     # HG changeset patch
>          >     # User Pierre-Yves David <pierre-yves.david@octobus.net
>         <mailto:pierre-yves.david@octobus.net>
>          >     <mailto:pierre-yves.david@octobus.net
>         <mailto:pierre-yves.david@octobus.net>>>
>          >     # Date 1558436902 -7200
>          >     #      Tue May 21 13:08:22 2019 +0200
>          >     # Node ID 123267b121ea268b3bc488c6dde68a8d93ee4f4c
>          >     # Parent  86f17fc31aa8a0c26c11db6a532293463ee6428a
>          >     # EXP-Topic discovery
>          >     # Available At https://bitbucket.org/octobus/mercurial-devel/
>          >     #              hg pull
>          > https://bitbucket.org/octobus/mercurial-devel/ -r 123267b121ea
>          >     discovery: slowly increase sampling size
>          >
>          >     Some pathological discovery runs can requires many
>         roundtrip. When
>          >     this happens
>          >     things can get very slow.
>          >
>          >     To make the algorithm more resilience again such
>         pathological case.
>          >     We slowly
>          >     increase the sample size with each roundtrip (+5%). This
>         will have a
>          >     negligible
>          >     impact on "normal" discovery with few roundtrips, but a large
>          >     positive impact of
>          >     case with many roundtrips. Asking more question per
>         roundtrip helps
>          >     to reduce
>          >     the undecided set faster. Instead of reducing the
>         undecided set a
>          >     linear speed
>          >     (in the worst case), we reduce it as a guaranteed (small)
>          >     exponential rate. The
>          >     data below show this slow ramp up in sample size:
>          >
>          >     round trip    |    1 |     5 |    10 |    20 |     50 | 
>             100 |
>          >         130 |
>          >     sample size   |  200 |   254 |   321 |   517 |  2 199 | 
>         25 123 |
>          >       108 549 |
>          >     covered nodes |  200 | 1 357 | 2 821 | 7 031 | 42 658 |
>         524 530 | 2
>          >     276 755 |
>          >
>          >     To be a bit more concrete, lets take a very pathological
>         case as an
>          >     example. We
>          >     are doing discovery from a copy of Mozilla-try to a more
>         recent
>          >     version of
>          >     mozilla-unified. Mozilla-unified heads are unknown to the
>          >     mozilla-try repo and
>          >     there are over 1 million "missing" changesets. (the
>         discovery is
>          >     "local" to
>          >     avoid network interference)
>          >
>          >
>          > What makes it slow in that case? If the 1 million changesets
>         are mostly
>          > linear, it should still be fast, right? So I assume they're
>         spread out
>          > across branches. Is it slow because the remote side has many
>         branches
>          > that the local side does not, or the other way around?
> 
>         The million changeset are on many different branches. (the
>         discovery is
>         run from mozilla-try (many changesets, tree like) toward
>         mozilla-unified
>         (mostly linear).
> 
> 
>     I think that means that there are many local heads. Maybe it's the
>     case I tried to address with https://phab.mercurial-scm.org/D2647. I
>     just re-opened that. I'll see if can get timing numbers for the
>     repos you mentioned here.

I had a look at this commit. The idea in your commit message is 
interesting, but I see a different one implemented in the code. See my 
comment for details.

> That's now done. I've added you as reviewer. I feel like that's a better 
> solution for this particular case. It results in significantly more 
> bytes transferred (+25%), but it's also much faster.

As said before in the scope of this series, I am not interest in any 
better solution any particular case. I am working on a general purpose 
safety net that improves the upper bound round trip complexity for every 
pathological cases.

The good point of this safety net is that it is not mutually exclusive 
with improving for particular cases. For example, you change and my 
change could peacefully coexist.

Can we move forward with this patch before discussing more specific change?

Cheers,
via Mercurial-devel - May 21, 2019, 8:40 p.m.
On Tue, May 21, 2019 at 1:24 PM Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 5/21/19 10:00 PM, Martin von Zweigbergk wrote:
> >
> >
> > On Tue, May 21, 2019 at 11:10 AM Martin von Zweigbergk
> > <martinvonz@google.com <mailto:martinvonz@google.com>> wrote:
> >
> >
> >
> >     On Tue, May 21, 2019 at 10:09 AM Pierre-Yves David
> >     <pierre-yves.david@ens-lyon.org
> >     <mailto:pierre-yves.david@ens-lyon.org>> wrote:
> >
> >
> >
> >         On 5/21/19 6:05 PM, Martin von Zweigbergk wrote:
> >          >
> >          >
> >          > On Tue, May 21, 2019 at 4:22 AM Pierre-Yves David
> >          > <pierre-yves.david@ens-lyon.org
> >         <mailto:pierre-yves.david@ens-lyon.org>
> >         <mailto:pierre-yves.david@ens-lyon.org
> >         <mailto:pierre-yves.david@ens-lyon.org>>>
> >          > wrote:
> >          >
> >          >     # HG changeset patch
> >          >     # User Pierre-Yves David <pierre-yves.david@octobus.net
> >         <mailto:pierre-yves.david@octobus.net>
> >          >     <mailto:pierre-yves.david@octobus.net
> >         <mailto:pierre-yves.david@octobus.net>>>
> >          >     # Date 1558436902 -7200
> >          >     #      Tue May 21 13:08:22 2019 +0200
> >          >     # Node ID 123267b121ea268b3bc488c6dde68a8d93ee4f4c
> >          >     # Parent  86f17fc31aa8a0c26c11db6a532293463ee6428a
> >          >     # EXP-Topic discovery
> >          >     # Available At
> https://bitbucket.org/octobus/mercurial-devel/
> >          >     #              hg pull
> >          > https://bitbucket.org/octobus/mercurial-devel/ -r
> 123267b121ea
> >          >     discovery: slowly increase sampling size
> >          >
> >          >     Some pathological discovery runs can requires many
> >         roundtrip. When
> >          >     this happens
> >          >     things can get very slow.
> >          >
> >          >     To make the algorithm more resilience again such
> >         pathological case.
> >          >     We slowly
> >          >     increase the sample size with each roundtrip (+5%). This
> >         will have a
> >          >     negligible
> >          >     impact on "normal" discovery with few roundtrips, but a
> large
> >          >     positive impact of
> >          >     case with many roundtrips. Asking more question per
> >         roundtrip helps
> >          >     to reduce
> >          >     the undecided set faster. Instead of reducing the
> >         undecided set a
> >          >     linear speed
> >          >     (in the worst case), we reduce it as a guaranteed (small)
> >          >     exponential rate. The
> >          >     data below show this slow ramp up in sample size:
> >          >
> >          >     round trip    |    1 |     5 |    10 |    20 |     50 |
> >             100 |
> >          >         130 |
> >          >     sample size   |  200 |   254 |   321 |   517 |  2 199 |
> >         25 123 |
> >          >       108 549 |
> >          >     covered nodes |  200 | 1 357 | 2 821 | 7 031 | 42 658 |
> >         524 530 | 2
> >          >     276 755 |
> >          >
> >          >     To be a bit more concrete, lets take a very pathological
> >         case as an
> >          >     example. We
> >          >     are doing discovery from a copy of Mozilla-try to a more
> >         recent
> >          >     version of
> >          >     mozilla-unified. Mozilla-unified heads are unknown to the
> >          >     mozilla-try repo and
> >          >     there are over 1 million "missing" changesets. (the
> >         discovery is
> >          >     "local" to
> >          >     avoid network interference)
> >          >
> >          >
> >          > What makes it slow in that case? If the 1 million changesets
> >         are mostly
> >          > linear, it should still be fast, right? So I assume they're
> >         spread out
> >          > across branches. Is it slow because the remote side has many
> >         branches
> >          > that the local side does not, or the other way around?
> >
> >         The million changeset are on many different branches. (the
> >         discovery is
> >         run from mozilla-try (many changesets, tree like) toward
> >         mozilla-unified
> >         (mostly linear).
> >
> >
> >     I think that means that there are many local heads. Maybe it's the
> >     case I tried to address with https://phab.mercurial-scm.org/D2647. I
> >     just re-opened that. I'll see if can get timing numbers for the
> >     repos you mentioned here.
>
> I had a look at this commit. The idea in your commit message is
> interesting, but I see a different one implemented in the code. See my
> comment for details.
>
> > That's now done. I've added you as reviewer. I feel like that's a better
> > solution for this particular case. It results in significantly more
> > bytes transferred (+25%), but it's also much faster.
>
> As said before in the scope of this series, I am not interest in any
> better solution any particular case. I am working on a general purpose
> safety net that improves the upper bound round trip complexity for every
> pathological cases.
>
> The good point of this safety net is that it is not mutually exclusive
> with improving for particular cases. For example, you change and my
> change could peacefully coexist.
>
> Can we move forward with this patch before discussing more specific change?
>

I'm not sure what other pathological cases remain after that one is fixed,
but sure, I'll queue this anyway.


>
> Cheers,
>
> --
> Pierre-Yves David
>
Pierre-Yves David - May 21, 2019, 11:10 p.m.
On 5/21/19 10:40 PM, Martin von Zweigbergk wrote:
> 
> 
> On Tue, May 21, 2019 at 1:24 PM Pierre-Yves David 
> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>> 
> wrote:
> 
> 
> 
>     On 5/21/19 10:00 PM, Martin von Zweigbergk wrote:
>      >
>      >
>      > On Tue, May 21, 2019 at 11:10 AM Martin von Zweigbergk
>      > <martinvonz@google.com <mailto:martinvonz@google.com>
>     <mailto:martinvonz@google.com <mailto:martinvonz@google.com>>> wrote:
>      >
>      >
>      >
>      >     On Tue, May 21, 2019 at 10:09 AM Pierre-Yves David
>      >     <pierre-yves.david@ens-lyon.org
>     <mailto:pierre-yves.david@ens-lyon.org>
>      >     <mailto:pierre-yves.david@ens-lyon.org
>     <mailto:pierre-yves.david@ens-lyon.org>>> wrote:
>      >
>      >
>      >
>      >         On 5/21/19 6:05 PM, Martin von Zweigbergk wrote:
>      >          >
>      >          >
>      >          > On Tue, May 21, 2019 at 4:22 AM Pierre-Yves David
>      >          > <pierre-yves.david@ens-lyon.org
>     <mailto:pierre-yves.david@ens-lyon.org>
>      >         <mailto:pierre-yves.david@ens-lyon.org
>     <mailto:pierre-yves.david@ens-lyon.org>>
>      >         <mailto:pierre-yves.david@ens-lyon.org
>     <mailto:pierre-yves.david@ens-lyon.org>
>      >         <mailto:pierre-yves.david@ens-lyon.org
>     <mailto:pierre-yves.david@ens-lyon.org>>>>
>      >          > wrote:
>      >          >
>      >          >     # HG changeset patch
>      >          >     # User Pierre-Yves David
>     <pierre-yves.david@octobus.net <mailto:pierre-yves.david@octobus.net>
>      >         <mailto:pierre-yves.david@octobus.net
>     <mailto:pierre-yves.david@octobus.net>>
>      >          >     <mailto:pierre-yves.david@octobus.net
>     <mailto:pierre-yves.david@octobus.net>
>      >         <mailto:pierre-yves.david@octobus.net
>     <mailto:pierre-yves.david@octobus.net>>>>
>      >          >     # Date 1558436902 -7200
>      >          >     #      Tue May 21 13:08:22 2019 +0200
>      >          >     # Node ID 123267b121ea268b3bc488c6dde68a8d93ee4f4c
>      >          >     # Parent  86f17fc31aa8a0c26c11db6a532293463ee6428a
>      >          >     # EXP-Topic discovery
>      >          >     # Available At
>     https://bitbucket.org/octobus/mercurial-devel/
>      >          >     #              hg pull
>      >          > https://bitbucket.org/octobus/mercurial-devel/ -r
>     123267b121ea
>      >          >     discovery: slowly increase sampling size
>      >          >
>      >          >     Some pathological discovery runs can requires many
>      >         roundtrip. When
>      >          >     this happens
>      >          >     things can get very slow.
>      >          >
>      >          >     To make the algorithm more resilience again such
>      >         pathological case.
>      >          >     We slowly
>      >          >     increase the sample size with each roundtrip
>     (+5%). This
>      >         will have a
>      >          >     negligible
>      >          >     impact on "normal" discovery with few roundtrips,
>     but a large
>      >          >     positive impact of
>      >          >     case with many roundtrips. Asking more question per
>      >         roundtrip helps
>      >          >     to reduce
>      >          >     the undecided set faster. Instead of reducing the
>      >         undecided set a
>      >          >     linear speed
>      >          >     (in the worst case), we reduce it as a guaranteed
>     (small)
>      >          >     exponential rate. The
>      >          >     data below show this slow ramp up in sample size:
>      >          >
>      >          >     round trip    |    1 |     5 |    10 |    20 |   
>       50 |
>      >             100 |
>      >          >         130 |
>      >          >     sample size   |  200 |   254 |   321 |   517 |  2
>     199 |
>      >         25 123 |
>      >          >       108 549 |
>      >          >     covered nodes |  200 | 1 357 | 2 821 | 7 031 | 42
>     658 |
>      >         524 530 | 2
>      >          >     276 755 |
>      >          >
>      >          >     To be a bit more concrete, lets take a very
>     pathological
>      >         case as an
>      >          >     example. We
>      >          >     are doing discovery from a copy of Mozilla-try to
>     a more
>      >         recent
>      >          >     version of
>      >          >     mozilla-unified. Mozilla-unified heads are unknown
>     to the
>      >          >     mozilla-try repo and
>      >          >     there are over 1 million "missing" changesets. (the
>      >         discovery is
>      >          >     "local" to
>      >          >     avoid network interference)
>      >          >
>      >          >
>      >          > What makes it slow in that case? If the 1 million
>     changesets
>      >         are mostly
>      >          > linear, it should still be fast, right? So I assume
>     they're
>      >         spread out
>      >          > across branches. Is it slow because the remote side
>     has many
>      >         branches
>      >          > that the local side does not, or the other way around?
>      >
>      >         The million changeset are on many different branches. (the
>      >         discovery is
>      >         run from mozilla-try (many changesets, tree like) toward
>      >         mozilla-unified
>      >         (mostly linear).
>      >
>      >
>      >     I think that means that there are many local heads. Maybe
>     it's the
>      >     case I tried to address with
>     https://phab.mercurial-scm.org/D2647. I
>      >     just re-opened that. I'll see if can get timing numbers for the
>      >     repos you mentioned here.
> 
>     I had a look at this commit. The idea in your commit message is
>     interesting, but I see a different one implemented in the code. See my
>     comment for details.
> 
>      > That's now done. I've added you as reviewer. I feel like that's a
>     better
>      > solution for this particular case. It results in significantly more
>      > bytes transferred (+25%), but it's also much faster.
> 
>     As said before in the scope of this series, I am not interest in any
>     better solution any particular case. I am working on a general purpose
>     safety net that improves the upper bound round trip complexity for
>     every
>     pathological cases.
> 
>     The good point of this safety net is that it is not mutually exclusive
>     with improving for particular cases. For example, you change and my
>     change could peacefully coexist.
> 
>     Can we move forward with this patch before discussing more specific
>     change?
> 
> 
> I'm not sure what other pathological cases remain after that one is 
> fixed, but sure, I'll queue this anyway.

The nice thing with this patch, is that it does not matters, whatever 
other pathological cases lurk in the shadow got better ;-)

Patch

diff --git a/mercurial/setdiscovery.py b/mercurial/setdiscovery.py
--- a/mercurial/setdiscovery.py
+++ b/mercurial/setdiscovery.py
@@ -256,7 +256,8 @@  def findcommonheads(ui, local, remote,
                     initialsamplesize=100,
                     fullsamplesize=200,
                     abortwhenunrelated=True,
-                    ancestorsof=None):
+                    ancestorsof=None,
+                    samplegrowth=1.05):
     '''Return a tuple (common, anyincoming, remoteheads) used to identify
     missing nodes from or in remote.
     '''
@@ -389,6 +390,8 @@  def findcommonheads(ui, local, remote,
                 ui.debug("taking initial sample\n")
             samplefunc = disco.takefullsample
             targetsize = fullsamplesize
+            if not remote.limitedarguments:
+                fullsamplesize = int(fullsamplesize * samplegrowth)
         else:
             # use even cheaper initial sample
             ui.debug("taking quick initial sample\n")
diff --git a/tests/test-setdiscovery.t b/tests/test-setdiscovery.t
--- a/tests/test-setdiscovery.t
+++ b/tests/test-setdiscovery.t
@@ -980,10 +980,10 @@  One with >200 heads. We now switch to se
   query 3; still undecided: 980, sample size is: 200
   sampling from both directions
   searching: 4 queries
-  query 4; still undecided: \d+, sample size is: 200 (re)
+  query 4; still undecided: 435, sample size is: 210
   sampling from both directions
   searching: 5 queries
-  query 5; still undecided: 195, sample size is: 195
+  query 5; still undecided: 185, sample size is: 185
   5 total queries in *.????s (glob)
   elapsed time:  * seconds (glob)
   heads summary: