Patchwork [1,of,2,stable] discovery: limit 'all local heads known remotely' to real 'all' (issue4438)

login
register
mail settings
Submitter Mads Kiilerich
Date Nov. 5, 2014, 12:06 p.m.
Message ID <2396965a3523fce4059b.1415189210@ssl.google-analytics.com>
Download mbox | patch
Permalink /patch/6574/
State Accepted
Headers show

Comments

Mads Kiilerich - Nov. 5, 2014, 12:06 p.m.
# HG changeset patch
# User Mads Kiilerich <madski@unity3d.com>
# Date 1415189129 -3600
#      Wed Nov 05 13:05:29 2014 +0100
# Branch stable
# Node ID 2396965a3523fce4059beba5c6c8ff9873000afd
# Parent  a3c2d92112948b2fa1ac3881920ac4d932cda6c8
discovery: limit 'all local heads known remotely' to real 'all' (issue4438)

3ef893520a85 made it possible that the initial head check didn't include all
heads. If that is the case, don't use the early exit just because this random
sample happened to be 'all known'.

Note: the randomness in the discovery protocol can make this problem hard to
reproduce.
Martin von Zweigbergk - Nov. 5, 2014, 4:48 p.m.
On Wed Nov 05 2014 at 4:07:26 AM Mads Kiilerich <mads@kiilerich.com> wrote:
>
> Note: the randomness in the discovery protocol can make this problem hard
> to
> reproduce.
>

A little off-topic, but is there a way to reduce the randomness? Would
simply always using the same random seed be okay? Or basing it on some
repository state?
Matt Mackall - Nov. 5, 2014, 5:07 p.m.
On Wed, 2014-11-05 at 16:48 +0000, Martin von Zweigbergk wrote:
> On Wed Nov 05 2014 at 4:07:26 AM Mads Kiilerich <mads@kiilerich.com> wrote:
> >
> > Note: the randomness in the discovery protocol can make this problem hard
> > to
> > reproduce.
> >
> 
> A little off-topic, but is there a way to reduce the randomness? Would
> simply always using the same random seed be okay? Or basing it on some
> repository state?

We've actually got this tidbit in commands:debugsetdiscovery:

    # make sure tests are repeatable
    random.seed(12323)

One of the benefits of NOT having such a seed is that we sometimes
discover bugs when the RNG takes us somewhere new.
Martin von Zweigbergk - Nov. 5, 2014, 5:12 p.m.
On Wed Nov 05 2014 at 9:07:15 AM Matt Mackall <mpm@selenic.com> wrote:

> On Wed, 2014-11-05 at 16:48 +0000, Martin von Zweigbergk wrote:
> > On Wed Nov 05 2014 at 4:07:26 AM Mads Kiilerich <mads@kiilerich.com>
> wrote:
> > >
> > > Note: the randomness in the discovery protocol can make this problem
> hard
> > > to
> > > reproduce.
> > >
> >
> > A little off-topic, but is there a way to reduce the randomness? Would
> > simply always using the same random seed be okay? Or basing it on some
> > repository state?
>
> We've actually got this tidbit in commands:debugsetdiscovery:
>
>     # make sure tests are repeatable
>     random.seed(12323)
>
> One of the benefits of NOT having such a seed is that we sometimes
> discover bugs when the RNG takes us somewhere new.
>

Makes sense. Is that just by accident, though? I.e, did you consider always
using the same seed (or, again, based on some repo state)? An obvious
benefit of using a deterministic seed is that you can just re-run a failed
pull with --traceback, which I suppose you cannot currently do.
Pierre-Yves David - Nov. 5, 2014, 5:14 p.m.
On 11/05/2014 05:12 PM, Martin von Zweigbergk wrote:
>
>
> On Wed Nov 05 2014 at 9:07:15 AM Matt Mackall <mpm@selenic.com
> <mailto:mpm@selenic.com>> wrote:
>
>     On Wed, 2014-11-05 at 16:48 +0000, Martin von Zweigbergk wrote:
>      > On Wed Nov 05 2014 at 4:07:26 AM Mads Kiilerich
>     <mads@kiilerich.com <mailto:mads@kiilerich.com>> wrote:
>      > >
>      > > Note: the randomness in the discovery protocol can make this
>     problem hard
>      > > to
>      > > reproduce.
>      > >
>      >
>      > A little off-topic, but is there a way to reduce the randomness?
>     Would
>      > simply always using the same random seed be okay? Or basing it on
>     some
>      > repository state?
>
>     We've actually got this tidbit in commands:debugsetdiscovery:
>
>          # make sure tests are repeatable
>          random.seed(12323)
>
>     One of the benefits of NOT having such a seed is that we sometimes
>     discover bugs when the RNG takes us somewhere new.
>
> Makes sense. Is that just by accident, though? I.e, did you consider
> always using the same seed (or, again, based on some repo state)? An
> obvious benefit of using a deterministic seed is that you can just
> re-run a failed pull with --traceback, which I suppose you cannot
> currently do.

The approach we use in the test is to display the seed used. Could 
include the seed used when discovery failed to be able to repro more easily.
Martin von Zweigbergk - Nov. 5, 2014, 5:17 p.m.
On Wed Nov 05 2014 at 9:14:29 AM Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 11/05/2014 05:12 PM, Martin von Zweigbergk wrote:
> >
> >
> > On Wed Nov 05 2014 at 9:07:15 AM Matt Mackall <mpm@selenic.com
> > <mailto:mpm@selenic.com>> wrote:
> >
> >     On Wed, 2014-11-05 at 16:48 +0000, Martin von Zweigbergk wrote:
> >      > On Wed Nov 05 2014 at 4:07:26 AM Mads Kiilerich
> >     <mads@kiilerich.com <mailto:mads@kiilerich.com>> wrote:
> >      > >
> >      > > Note: the randomness in the discovery protocol can make this
> >     problem hard
> >      > > to
> >      > > reproduce.
> >      > >
> >      >
> >      > A little off-topic, but is there a way to reduce the randomness?
> >     Would
> >      > simply always using the same random seed be okay? Or basing it on
> >     some
> >      > repository state?
> >
> >     We've actually got this tidbit in commands:debugsetdiscovery:
> >
> >          # make sure tests are repeatable
> >          random.seed(12323)
> >
> >     One of the benefits of NOT having such a seed is that we sometimes
> >     discover bugs when the RNG takes us somewhere new.
> >
> > Makes sense. Is that just by accident, though? I.e, did you consider
> > always using the same seed (or, again, based on some repo state)? An
> > obvious benefit of using a deterministic seed is that you can just
> > re-run a failed pull with --traceback, which I suppose you cannot
> > currently do.
>
> The approach we use in the test is to display the seed used. Could
> include the seed used when discovery failed to be able to repro more
> easily.
>

Oh, I see. So the seed is deterministic outside of tests as well; it's just
printed only in tests. Sorry about the noise then.
Pierre-Yves David - Nov. 5, 2014, 5:21 p.m.
On 11/05/2014 05:17 PM, Martin von Zweigbergk wrote:
>
>
> On Wed Nov 05 2014 at 9:14:29 AM Pierre-Yves David
> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
> wrote:
>
>
>
>     On 11/05/2014 05:12 PM, Martin von Zweigbergk wrote:
>      >
>      >
>      > On Wed Nov 05 2014 at 9:07:15 AM Matt Mackall <mpm@selenic.com
>     <mailto:mpm@selenic.com>
>      > <mailto:mpm@selenic.com <mailto:mpm@selenic.com>>> wrote:
>      >
>      >     On Wed, 2014-11-05 at 16:48 +0000, Martin von Zweigbergk wrote:
>      >      > On Wed Nov 05 2014 at 4:07:26 AM Mads Kiilerich
>      >     <mads@kiilerich.com <mailto:mads@kiilerich.com>
>     <mailto:mads@kiilerich.com <mailto:mads@kiilerich.com>>> wrote:
>      >      > >
>      >      > > Note: the randomness in the discovery protocol can make this
>      >     problem hard
>      >      > > to
>      >      > > reproduce.
>      >      > >
>      >      >
>      >      > A little off-topic, but is there a way to reduce the
>     randomness?
>      >     Would
>      >      > simply always using the same random seed be okay? Or
>     basing it on
>      >     some
>      >      > repository state?
>      >
>      >     We've actually got this tidbit in commands:debugsetdiscovery:
>      >
>      >          # make sure tests are repeatable
>      >          random.seed(12323)
>      >
>      >     One of the benefits of NOT having such a seed is that we
>     sometimes
>      >     discover bugs when the RNG takes us somewhere new.
>      >
>      > Makes sense. Is that just by accident, though? I.e, did you consider
>      > always using the same seed (or, again, based on some repo state)? An
>      > obvious benefit of using a deterministic seed is that you can just
>      > re-run a failed pull with --traceback, which I suppose you cannot
>      > currently do.
>
>     The approach we use in the test is to display the seed used. Could
>     include the seed used when discovery failed to be able to repro more
>     easily.
>
>
> Oh, I see. So the seed is deterministic outside of tests as well; it's
> just printed only in tests. Sorry about the noise then.

It is currently not deterministic outside of test.

What I'm trying to says is:
- non deterministic seed is good for wider testing
- having the ability to fix the seed is good for reproductability

We use a random but enforceable seed for python hashing in the test. We 
could use the same approach for discovery.
Martin von Zweigbergk - Nov. 5, 2014, 5:33 p.m.
On Wed Nov 05 2014 at 9:21:26 AM Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 11/05/2014 05:17 PM, Martin von Zweigbergk wrote:
> >
> >
> > On Wed Nov 05 2014 at 9:14:29 AM Pierre-Yves David
> > <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
> > wrote:
> >
> >
> >
> >     On 11/05/2014 05:12 PM, Martin von Zweigbergk wrote:
> >      >
> >      >
> >      > On Wed Nov 05 2014 at 9:07:15 AM Matt Mackall <mpm@selenic.com
> >     <mailto:mpm@selenic.com>
> >      > <mailto:mpm@selenic.com <mailto:mpm@selenic.com>>> wrote:
> >      >
> >      >     On Wed, 2014-11-05 at 16:48 +0000, Martin von Zweigbergk
> wrote:
> >      >      > On Wed Nov 05 2014 at 4:07:26 AM Mads Kiilerich
> >      >     <mads@kiilerich.com <mailto:mads@kiilerich.com>
> >     <mailto:mads@kiilerich.com <mailto:mads@kiilerich.com>>> wrote:
> >      >      > >
> >      >      > > Note: the randomness in the discovery protocol can make
> this
> >      >     problem hard
> >      >      > > to
> >      >      > > reproduce.
> >      >      > >
> >      >      >
> >      >      > A little off-topic, but is there a way to reduce the
> >     randomness?
> >      >     Would
> >      >      > simply always using the same random seed be okay? Or
> >     basing it on
> >      >     some
> >      >      > repository state?
> >      >
> >      >     We've actually got this tidbit in commands:debugsetdiscovery:
> >      >
> >      >          # make sure tests are repeatable
> >      >          random.seed(12323)
> >      >
> >      >     One of the benefits of NOT having such a seed is that we
> >     sometimes
> >      >     discover bugs when the RNG takes us somewhere new.
> >      >
> >      > Makes sense. Is that just by accident, though? I.e, did you
> consider
> >      > always using the same seed (or, again, based on some repo state)?
> An
> >      > obvious benefit of using a deterministic seed is that you can just
> >      > re-run a failed pull with --traceback, which I suppose you cannot
> >      > currently do.
> >
> >     The approach we use in the test is to display the seed used. Could
> >     include the seed used when discovery failed to be able to repro more
> >     easily.
> >
> >
> > Oh, I see. So the seed is deterministic outside of tests as well; it's
> > just printed only in tests. Sorry about the noise then.
>
> It is currently not deterministic outside of test.
>
> What I'm trying to says is:
> - non deterministic seed is good for wider testing
> - having the ability to fix the seed is good for reproductability
>
> We use a random but enforceable seed for python hashing in the test. We
> could use the same approach for discovery.
>

 Ah, now I understand (after looking at run-tests). I'm for making it
possible to reproduce a failure in production. I suppose that means moving
the PYTHONHASHSEED stuff into core. I would honestly prefer a fixed seed,
so even users who have no idea about the implementation will see
deterministic behavior (barring network delays and outages etc, of course).
Pierre-Yves David - Nov. 5, 2014, 5:39 p.m.
On 11/05/2014 05:33 PM, Martin von Zweigbergk wrote:
>
>
> On Wed Nov 05 2014 at 9:21:26 AM Pierre-Yves David
> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
> wrote:
>
>
>
>     On 11/05/2014 05:17 PM, Martin von Zweigbergk wrote:
>      >
>      >
>      > On Wed Nov 05 2014 at 9:14:29 AM Pierre-Yves David
>      > <pierre-yves.david@ens-lyon.__org
>     <mailto:pierre-yves.david@ens-lyon.org>
>     <mailto:pierre-yves.david@ens-__lyon.org
>     <mailto:pierre-yves.david@ens-lyon.org>>>
>      > wrote:
>      >
>      >
>      >
>      >     On 11/05/2014 05:12 PM, Martin von Zweigbergk wrote:
>      >      >
>      >      >
>      >      > On Wed Nov 05 2014 at 9:07:15 AM Matt Mackall
>     <mpm@selenic.com <mailto:mpm@selenic.com>
>      >     <mailto:mpm@selenic.com <mailto:mpm@selenic.com>>
>      >      > <mailto:mpm@selenic.com <mailto:mpm@selenic.com>
>     <mailto:mpm@selenic.com <mailto:mpm@selenic.com>>>> wrote:
>      >      >
>      >      >     On Wed, 2014-11-05 at 16:48 +0000, Martin von
>     Zweigbergk wrote:
>      >      >      > On Wed Nov 05 2014 at 4:07:26 AM Mads Kiilerich
>      >      >     <mads@kiilerich.com <mailto:mads@kiilerich.com>
>     <mailto:mads@kiilerich.com <mailto:mads@kiilerich.com>>
>      >     <mailto:mads@kiilerich.com <mailto:mads@kiilerich.com>
>     <mailto:mads@kiilerich.com <mailto:mads@kiilerich.com>>>> wrote:
>      >      >      > >
>      >      >      > > Note: the randomness in the discovery protocol
>     can make this
>      >      >     problem hard
>      >      >      > > to
>      >      >      > > reproduce.
>      >      >      > >
>      >      >      >
>      >      >      > A little off-topic, but is there a way to reduce the
>      >     randomness?
>      >      >     Would
>      >      >      > simply always using the same random seed be okay? Or
>      >     basing it on
>      >      >     some
>      >      >      > repository state?
>      >      >
>      >      >     We've actually got this tidbit in
>     commands:debugsetdiscovery:
>      >      >
>      >      >          # make sure tests are repeatable
>      >      >          random.seed(12323)
>      >      >
>      >      >     One of the benefits of NOT having such a seed is that we
>      >     sometimes
>      >      >     discover bugs when the RNG takes us somewhere new.
>      >      >
>      >      > Makes sense. Is that just by accident, though? I.e, did
>     you consider
>      >      > always using the same seed (or, again, based on some repo
>     state)? An
>      >      > obvious benefit of using a deterministic seed is that you
>     can just
>      >      > re-run a failed pull with --traceback, which I suppose you
>     cannot
>      >      > currently do.
>      >
>      >     The approach we use in the test is to display the seed used.
>     Could
>      >     include the seed used when discovery failed to be able to
>     repro more
>      >     easily.
>      >
>      >
>      > Oh, I see. So the seed is deterministic outside of tests as well;
>     it's
>      > just printed only in tests. Sorry about the noise then.
>
>     It is currently not deterministic outside of test.
>
>     What I'm trying to says is:
>     - non deterministic seed is good for wider testing
>     - having the ability to fix the seed is good for reproductability
>
>     We use a random but enforceable seed for python hashing in the test. We
>     could use the same approach for discovery.
>
>
>   Ah, now I understand (after looking at run-tests). I'm for making it
> possible to reproduce a failure in production. I suppose that means
> moving the PYTHONHASHSEED stuff into core. I would honestly prefer a
> fixed seed, so even users who have no idea about the implementation will
> see deterministic behavior (barring network delays and outages etc, of
> course).

Note that the PYTHONHASHSEED is a different logic than the discovery 
one. Including the pythonhashseed in mercurial bug report may be valuable.

We could allow enforcing the discovery seed through a config variable. 
and print the currently used one in a debug message (and when we crash ?)

Patch

diff --git a/mercurial/setdiscovery.py b/mercurial/setdiscovery.py
--- a/mercurial/setdiscovery.py
+++ b/mercurial/setdiscovery.py
@@ -165,7 +165,7 @@  def findcommonheads(ui, local, remote,
         ui.debug("all remote heads known locally\n")
         return (srvheadhashes, False, srvheadhashes,)
 
-    if sample and util.all(yesno):
+    if sample and len(ownheads) <= initialsamplesize and util.all(yesno):
         ui.note(_("all local heads known remotely\n"))
         ownheadhashes = dag.externalizeall(ownheads)
         return (ownheadhashes, True, srvheadhashes,)