Patchwork [1,of,5] graphmod: add a function for topological iteration

login
register
mail settings
Submitter Pierre-Yves David
Date Nov. 18, 2014, 11:56 p.m.
Message ID <06767ea27aca31cd1e65.1416354967@marginatus.alto.octopoid.net>
Download mbox | patch
Permalink /patch/6779/
State Superseded
Headers show

Comments

Pierre-Yves David - Nov. 18, 2014, 11:56 p.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@fb.com>
# Date 1409847572 -7200
#      Thu Sep 04 18:19:32 2014 +0200
# Node ID 06767ea27aca31cd1e65ccd32c22c45c1ebab83d
# Parent  e63941631a3f61b3323dbcc2545689b1eb34e308
graphmod: add a function for topological iteration

This changesets introduce a function to perform topological (one branch after
the other) iteration over a set of changeset. This first version have a lot of
limitation but the approach should be flexible enough to allow all kind of
improvement in the future. This changeset aims at setting the first stone more
than providing a final solution.

The algorithm works does not needs to know the whole set of nodes involved
before emitting revision. So make it a good candidate for usage in place like
`hg log` or graphical tools that need a fast first result time.
Martin von Zweigbergk - Nov. 19, 2014, 5:35 p.m.
On Tue Nov 18 2014 at 3:57:01 PM Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@fb.com>
> # Date 1409847572 -7200
> #      Thu Sep 04 18:19:32 2014 +0200
> # Node ID 06767ea27aca31cd1e65ccd32c22c45c1ebab83d
> # Parent  e63941631a3f61b3323dbcc2545689b1eb34e308
> graphmod: add a function for topological iteration
>
> This changesets introduce a function to perform topological (one branch
> after
> the other) iteration over a set of changeset. This first version have a
> lot of
> limitation but the approach should be flexible enough to allow all kind of
> improvement in the future. This changeset aims at setting the first stone
> more
> than providing a final solution.
>
> The algorithm works does not needs to know the whole set of nodes involved
> before emitting revision. So make it a good candidate for usage in place
> like
> `hg log` or graphical tools that need a fast first result time.
>
> diff --git a/mercurial/graphmod.py b/mercurial/graphmod.py
> --- a/mercurial/graphmod.py
> +++ b/mercurial/graphmod.py
> @@ -20,10 +20,160 @@ Data depends on type.
>  from mercurial.node import nullrev
>  import util
>
>  CHANGESET = 'C'
>
> +def topoiter(revs, parentsfunc):
> +    """topologically iter over a set of revision, one branch at a time.
> +
> +    Currently does not handle non contigious <revs> input.
> +
> +    Currently consider every changeset under a merge to be on the same
> branch
> +    using revision number to sort them.
>

Any plans for how to extend the algorithm for that case? Not asking you to
code it up, just checking that it's not too hard to extend for that case.


> +
> +    Could be easily extend to give priority to an initial branch.
> +
> +    The revision are emitted heads to roots."""
> +    ### Quick summary of the algorithm
> +    #
> +    # This function is based around a "retention" principle. We keep
> revision
> +    # in memory until we are ready to emit a whole branch that immediately
> +    # "merge" into an existing one. This reduce the number of branch
> "ongoing"
> +    # at the same time.
> +    #
> +    # During iteration revs are split into two groups:
> +    # A) revision already emitted
> +    # B) revision in "retention". They are stored as different subgroup.
> +    #
> +    # for each REV, we do the follow logic:
> +    #
> +    #   if REV is a parent of (A), we will emit it. But before emitting
> it,
> +    #   we'll "free" all the revs from subgroup in (B) that were waiting
> for
> +    #   REV to be available.
> +    #
> +    #   else, we'll search for a subgroup if (B) awaiting for this
> revision to
> +    #   be available, if such group exist, we add REV to it and the
> subgroup is
> +    #   now awaiting for REV.parents() to be available.
> +    #
> +    #   finally if no such group existed in (B), we create a new subgroup.
> +    #
> +    #
> +    # To bootstrap the algorithm, we display the first revision we saw.
>

What does this line refer to? There is no explicit bootstrapping here. It
seems like it flows from the "if no unblocked revisions exist, pick
arbitrary group", but I'm not sure.


> +
> +    revs.sort(reverse=True)
>

For someone not familiar with the revset (?) implementation, this seems to
go against the goal of incrementally producing the revisions. Is
"sort(reverse=True)" simply not doing anything to a lazy revset that is not
already explicitly sorted in some other order?


> +
> +    # set of parents of revision that have been yield. They can be
> considered
> +    # unblocked as the graph generator is already aware of them so there
> is no
> +    # need to delay the one that reference them.
> +    unblocked = set()
> +
> +    # list of group waiting to be displayed, each group is defined by:
> +    #
> +    #   (revs:    lists of revs waiting to be displayed,
> +    #    blocked: set of that cannot be displayed before those in the
> sets)
> +    #
> +    # The second value  correspond to parents of any revision in the group
> +    # that is not itself contained in the group. The main idea of this
> +    # algorithm is to delay as much as possible the emission of any
> revision.
> +    # This means waiting for the moment we are about to display theses
> +    # parents to display the revs in a group.
> +    #
> +    # This first implementation is smart until it meet a merge: it will
> +    # emit revs as soon as any parents is about to be emitted and can
> +    # grow an arbitrary number of revs in `blocked`. In practice this mean
> +    # we properly retains new branches but does not any special ordering
> +    # for ancestors of merges. The implementation can be improved to
> handle
> +    # this better.
> +    #
> +    # the first group is a special group. It correspond to all the
> revision
> +    # that were already emitted. the <revs> lists is expected to be empty
> +    # and the <blocked> set contains the parents revisions of already
> emitted
> +    # revision.
> +    #
> +    # You could pre-seed the <parents> set of groups[0] to a specific
> +    # changesets to select what the first emitted branch should be.
> +    #
> +    # We do not support revisions will hole yet, but adding such support
> +    # would be easy. The iteration will have to be done using both input
> +    # revision and parents (see cl.ancestors function + a few tweaks) but
> +    # only revisions parts of the initial set should be emitted.
> +    groups = [([], unblocked)]
> +    for current in revs:
> +        # Look for a group awaiting on the current revision.
> +        matching = [i for i, g in enumerate(groups) if current in g[1]]
> +
> +        if matching:
> +            # The main idea is to gather together all set that await on
> the
> +            # same revision.
> +            #
> +            # this merging is done at the time we are about to add this
> common
> +            # awaited to the group for simplicity purpose. Such merge
> could
> +            # happen sooner when we update the "blocked" set of revision.
> +            #
> +            # We also always keep the oldest group first. We can
> +            # probably improve the behavior by having the longuest set
> +            # first. That way, graph algorythms could minimise the
> +            # length of parallele lines their draw. This is currently
> +            # not done.
> +            targetidx = matching.pop(0)
> +            trevs, tparents = groups[targetidx]
> +            for i in matching:
> +                gr = groups[i]
> +                trevs.extend(gr[0])
> +                tparents |= gr[1]
> +            # delete all merged groups (but the one we keep)
> +            # (starting from the last group for performance and sanity
> reason)
> +            for i in reversed(matching):
> +                del groups[i]
> +        else:
> +            # his is a new head we create a new group for it.
> +            targetidx = len(groups)
> +            groups.append(([], set([current])))
> +
> +        gr = groups[targetidx]
> +
> +        # We now adds the current nodes to this groups. This is done after
> +        # the group merging because all elements from a group that relied
> on
> +        # this rev must preceed it.
> +        #
> +        # we also update the <parents> set to includes the parents on the
> +        # new nodes.
> +        gr[0].append(current)
> +        gr[1].remove(current)
> +        gr[1].update([p for p in parentsfunc(current) if p > nullrev])
> +
> +        # Look for a group to display
> +        #
> +        # When unblocked is empty (if clause), We are not waiting over any
> +        # revision during the first iteration (if no priority was given)
> or if
> +        # we outputed a whole disconnected sets of the graph (reached a
> root).
> +        # In that case we arbitrarily takes the oldest known group. The
> +        # heuristique could probably be better.
> +        #
> +        # Otherwise (elif clause) this mean we have some emitted revision.
> +        # if the group awaits on the same revision that the outputed ones,
> +        # we can safely output it.
> +        if not unblocked:
> +            if len(groups) > 1:  # display other subset
> +                targetidx = 1
> +                gr = groups[1]
> +        elif not gr[1] & unblocked:
> +            gr = None
> +
> +        if gr is not None:
> +            # update the set of awaited revisions with the one from the
> group
> +            unblocked |= gr[1]
> +            # output all revisions in the group
> +            for r in gr[0]:
> +                yield r
> +            # deleted the group that you just outputed.
> +            # unless it is group[0] in which case you just empty it.
> +            if targetidx:
> +                del groups[targetidx]
> +            else:
> +                gr[0][:] = []
> +
>  def dagwalker(repo, revs):
>      """cset DAG generator yielding (id, CHANGESET, ctx, [parentids])
> tuples
>
>      This generator function walks through revisions (which should be
> ordered
>      from bigger to lower). It returns a tuple for each node. The node and
> parent
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
>
Augie Fackler - Nov. 25, 2014, 3:54 p.m.
On Wed, Nov 19, 2014 at 05:35:05PM +0000, Martin von Zweigbergk wrote:
> On Tue Nov 18 2014 at 3:57:01 PM Pierre-Yves David <
> pierre-yves.david@ens-lyon.org> wrote:
>
> > # HG changeset patch
> > # User Pierre-Yves David <pierre-yves.david@fb.com>
> > # Date 1409847572 -7200
> > #      Thu Sep 04 18:19:32 2014 +0200
> > # Node ID 06767ea27aca31cd1e65ccd32c22c45c1ebab83d
> > # Parent  e63941631a3f61b3323dbcc2545689b1eb34e308
> > graphmod: add a function for topological iteration
> >
> > This changesets introduce a function to perform topological (one branch
> > after
> > the other) iteration over a set of changeset. This first version have a
> > lot of
> > limitation but the approach should be flexible enough to allow all kind of
> > improvement in the future. This changeset aims at setting the first stone
> > more
> > than providing a final solution.
> >
> > The algorithm works does not needs to know the whole set of nodes involved
> > before emitting revision. So make it a good candidate for usage in place
> > like
> > `hg log` or graphical tools that need a fast first result time.
> >
> > diff --git a/mercurial/graphmod.py b/mercurial/graphmod.py
> > --- a/mercurial/graphmod.py
> > +++ b/mercurial/graphmod.py
> > @@ -20,10 +20,160 @@ Data depends on type.
> >  from mercurial.node import nullrev
> >  import util
> >
> >  CHANGESET = 'C'
> >
> > +def topoiter(revs, parentsfunc):
> > +    """topologically iter over a set of revision, one branch at a time.
> > +
> > +    Currently does not handle non contigious <revs> input.
> > +
> > +    Currently consider every changeset under a merge to be on the same
> > branch
> > +    using revision number to sort them.
> >
>
> Any plans for how to extend the algorithm for that case? Not asking you to
> code it up, just checking that it's not too hard to extend for that case.
>
>
> > +
> > +    Could be easily extend to give priority to an initial branch.
> > +
> > +    The revision are emitted heads to roots."""
> > +    ### Quick summary of the algorithm
> > +    #
> > +    # This function is based around a "retention" principle. We keep
> > revision
> > +    # in memory until we are ready to emit a whole branch that immediately
> > +    # "merge" into an existing one. This reduce the number of branch
> > "ongoing"
> > +    # at the same time.
> > +    #
> > +    # During iteration revs are split into two groups:
> > +    # A) revision already emitted
> > +    # B) revision in "retention". They are stored as different subgroup.
> > +    #
> > +    # for each REV, we do the follow logic:
> > +    #
> > +    #   if REV is a parent of (A), we will emit it. But before emitting
> > it,
> > +    #   we'll "free" all the revs from subgroup in (B) that were waiting
> > for
> > +    #   REV to be available.
> > +    #
> > +    #   else, we'll search for a subgroup if (B) awaiting for this
> > revision to
> > +    #   be available, if such group exist, we add REV to it and the
> > subgroup is
> > +    #   now awaiting for REV.parents() to be available.
> > +    #
> > +    #   finally if no such group existed in (B), we create a new subgroup.
> > +    #
> > +    #
> > +    # To bootstrap the algorithm, we display the first revision we saw.
> >
>
> What does this line refer to? There is no explicit bootstrapping here. It
> seems like it flows from the "if no unblocked revisions exist, pick
> arbitrary group", but I'm not sure.
>
>
> > +
> > +    revs.sort(reverse=True)
> >
>
> For someone not familiar with the revset (?) implementation, this seems to
> go against the goal of incrementally producing the revisions. Is
> "sort(reverse=True)" simply not doing anything to a lazy revset that is not
> already explicitly sorted in some other order?

I'd also like to see the answer to this question, FWIW.

(I'm deferring any further review of this until I can chat with
marmoute in IRC, since this patch is enormous and it may be better for
us to review it using voice chat.)

>
>
> > +
> > +    # set of parents of revision that have been yield. They can be
> > considered
> > +    # unblocked as the graph generator is already aware of them so there
> > is no
> > +    # need to delay the one that reference them.
> > +    unblocked = set()
> > +
> > +    # list of group waiting to be displayed, each group is defined by:
> > +    #
> > +    #   (revs:    lists of revs waiting to be displayed,
> > +    #    blocked: set of that cannot be displayed before those in the
> > sets)
> > +    #
> > +    # The second value  correspond to parents of any revision in the group
> > +    # that is not itself contained in the group. The main idea of this
> > +    # algorithm is to delay as much as possible the emission of any
> > revision.
> > +    # This means waiting for the moment we are about to display theses
> > +    # parents to display the revs in a group.
> > +    #
> > +    # This first implementation is smart until it meet a merge: it will
> > +    # emit revs as soon as any parents is about to be emitted and can
> > +    # grow an arbitrary number of revs in `blocked`. In practice this mean
> > +    # we properly retains new branches but does not any special ordering
> > +    # for ancestors of merges. The implementation can be improved to
> > handle
> > +    # this better.
> > +    #
> > +    # the first group is a special group. It correspond to all the
> > revision
> > +    # that were already emitted. the <revs> lists is expected to be empty
> > +    # and the <blocked> set contains the parents revisions of already
> > emitted
> > +    # revision.
> > +    #
> > +    # You could pre-seed the <parents> set of groups[0] to a specific
> > +    # changesets to select what the first emitted branch should be.
> > +    #
> > +    # We do not support revisions will hole yet, but adding such support
> > +    # would be easy. The iteration will have to be done using both input
> > +    # revision and parents (see cl.ancestors function + a few tweaks) but
> > +    # only revisions parts of the initial set should be emitted.
> > +    groups = [([], unblocked)]
> > +    for current in revs:
> > +        # Look for a group awaiting on the current revision.
> > +        matching = [i for i, g in enumerate(groups) if current in g[1]]
> > +
> > +        if matching:
> > +            # The main idea is to gather together all set that await on
> > the
> > +            # same revision.
> > +            #
> > +            # this merging is done at the time we are about to add this
> > common
> > +            # awaited to the group for simplicity purpose. Such merge
> > could
> > +            # happen sooner when we update the "blocked" set of revision.
> > +            #
> > +            # We also always keep the oldest group first. We can
> > +            # probably improve the behavior by having the longuest set
> > +            # first. That way, graph algorythms could minimise the
> > +            # length of parallele lines their draw. This is currently
> > +            # not done.
> > +            targetidx = matching.pop(0)
> > +            trevs, tparents = groups[targetidx]
> > +            for i in matching:
> > +                gr = groups[i]
> > +                trevs.extend(gr[0])
> > +                tparents |= gr[1]
> > +            # delete all merged groups (but the one we keep)
> > +            # (starting from the last group for performance and sanity
> > reason)
> > +            for i in reversed(matching):
> > +                del groups[i]
> > +        else:
> > +            # his is a new head we create a new group for it.
> > +            targetidx = len(groups)
> > +            groups.append(([], set([current])))
> > +
> > +        gr = groups[targetidx]
> > +
> > +        # We now adds the current nodes to this groups. This is done after
> > +        # the group merging because all elements from a group that relied
> > on
> > +        # this rev must preceed it.
> > +        #
> > +        # we also update the <parents> set to includes the parents on the
> > +        # new nodes.
> > +        gr[0].append(current)
> > +        gr[1].remove(current)
> > +        gr[1].update([p for p in parentsfunc(current) if p > nullrev])
> > +
> > +        # Look for a group to display
> > +        #
> > +        # When unblocked is empty (if clause), We are not waiting over any
> > +        # revision during the first iteration (if no priority was given)
> > or if
> > +        # we outputed a whole disconnected sets of the graph (reached a
> > root).
> > +        # In that case we arbitrarily takes the oldest known group. The
> > +        # heuristique could probably be better.
> > +        #
> > +        # Otherwise (elif clause) this mean we have some emitted revision.
> > +        # if the group awaits on the same revision that the outputed ones,
> > +        # we can safely output it.
> > +        if not unblocked:
> > +            if len(groups) > 1:  # display other subset
> > +                targetidx = 1
> > +                gr = groups[1]
> > +        elif not gr[1] & unblocked:
> > +            gr = None
> > +
> > +        if gr is not None:
> > +            # update the set of awaited revisions with the one from the
> > group
> > +            unblocked |= gr[1]
> > +            # output all revisions in the group
> > +            for r in gr[0]:
> > +                yield r
> > +            # deleted the group that you just outputed.
> > +            # unless it is group[0] in which case you just empty it.
> > +            if targetidx:
> > +                del groups[targetidx]
> > +            else:
> > +                gr[0][:] = []
> > +
> >  def dagwalker(repo, revs):
> >      """cset DAG generator yielding (id, CHANGESET, ctx, [parentids])
> > tuples
> >
> >      This generator function walks through revisions (which should be
> > ordered
> >      from bigger to lower). It returns a tuple for each node. The node and
> > parent
> > _______________________________________________
> > Mercurial-devel mailing list
> > Mercurial-devel@selenic.com
> > http://selenic.com/mailman/listinfo/mercurial-devel
> >

> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
Pierre-Yves David - Nov. 25, 2014, 6:24 p.m.
(resend, apparently never made it to the list)

On 11/19/2014 09:35 AM, Martin von Zweigbergk wrote:
>
>
> On Tue Nov 18 2014 at 3:57:01 PM Pierre-Yves David
> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
> wrote:
>
>     # HG changeset patch
>     # User Pierre-Yves David <pierre-yves.david@fb.com
>     <mailto:pierre-yves.david@fb.com>>
>     # Date 1409847572 -7200
>     #      Thu Sep 04 18:19:32 2014 +0200
>     # Node ID 06767ea27aca31cd1e65ccd32c22c4__5c1ebab83d
>     # Parent  e63941631a3f61b3323dbcc2545689__b1eb34e308
>     graphmod: add a function for topological iteration
>
>     This changesets introduce a function to perform topological (one
>     branch after
>     the other) iteration over a set of changeset. This first version
>     have a lot of
>     limitation but the approach should be flexible enough to allow all
>     kind of
>     improvement in the future. This changeset aims at setting the first
>     stone more
>     than providing a final solution.
>
>     The algorithm works does not needs to know the whole set of nodes
>     involved
>     before emitting revision. So make it a good candidate for usage in
>     place like
>     `hg log` or graphical tools that need a fast first result time.
>
>     diff --git a/mercurial/graphmod.py b/mercurial/graphmod.py
>     --- a/mercurial/graphmod.py
>     +++ b/mercurial/graphmod.py
>     @@ -20,10 +20,160 @@ Data depends on type.
>       from mercurial.node import nullrev
>       import util
>
>       CHANGESET = 'C'
>
>     +def topoiter(revs, parentsfunc):
>     +    """topologically iter over a set of revision, one branch at a time.
>     +
>     +    Currently does not handle non contigious <revs> input.
>     +
>     +    Currently consider every changeset under a merge to be on the
>     same branch
>     +    using revision number to sort them.
>
>
> Any plans for how to extend the algorithm for that case? Not asking you
> to code it up, just checking that it's not too hard to extend for that case.

The current algorithm emit revisions whenever a parent is available. We 
can extend it to wait for both parents to be available. This requires 
significant extension of the current logic but it is possible (and 
somewhat planned).

>     +    Could be easily extend to give priority to an initial branch.
>     +
>     +    The revision are emitted heads to roots."""
>     +    ### Quick summary of the algorithm
>     +    #
>     +    # This function is based around a "retention" principle. We
>     keep revision
>     +    # in memory until we are ready to emit a whole branch that
>     immediately
>     +    # "merge" into an existing one. This reduce the number of
>     branch "ongoing"
>     +    # at the same time.
>     +    #
>     +    # During iteration revs are split into two groups:
>     +    # A) revision already emitted
>     +    # B) revision in "retention". They are stored as different
>     subgroup.
>     +    #
>     +    # for each REV, we do the follow logic:
>     +    #
>     +    #   if REV is a parent of (A), we will emit it. But before
>     emitting it,
>     +    #   we'll "free" all the revs from subgroup in (B) that were
>     waiting for
>     +    #   REV to be available.
>     +    #
>     +    #   else, we'll search for a subgroup if (B) awaiting for this
>     revision to
>     +    #   be available, if such group exist, we add REV to it and the
>     subgroup is
>     +    #   now awaiting for REV.parents() to be available.
>     +    #
>     +    #   finally if no such group existed in (B), we create a new
>     subgroup.
>     +    #
>     +    #
>     +    # To bootstrap the algorithm, we display the first revision we saw.
>
>
> What does this line refer to? There is no explicit bootstrapping here.
> It seems like it flows from the "if no unblocked revisions exist, pick
> arbitrary group", but I'm not sure.

That is it.

>     +    revs.sort(reverse=True)
>
>
> For someone not familiar with the revset (?) implementation, this seems
> to go against the goal of incrementally producing the revisions. Is
> "sort(reverse=True)" simply not doing anything to a lazy revset that is
> not already explicitly sorted in some other order?

Yes revset are lazily sorted. This "sort" call mean "I would like to 
iterate over this in a sorted order" the revset do the minimal amount of 
work to make it happen.
Augie Fackler - Dec. 1, 2014, 9 p.m.
On Tue, Nov 18, 2014 at 11:56:07PM +0000, Pierre-Yves David wrote:
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@fb.com>
> # Date 1409847572 -7200
> #      Thu Sep 04 18:19:32 2014 +0200
> # Node ID 06767ea27aca31cd1e65ccd32c22c45c1ebab83d
> # Parent  e63941631a3f61b3323dbcc2545689b1eb34e308
> graphmod: add a function for topological iteration

Pierre-Yves and I went over this verbally using a voice chat, and he's
going to send a revised version that renames the function.

In the name of reviewer and author sanity, he's going to include a
couple of follow-up patches that address some comment cleanup and
makes groups[0] not special in the algorithm.

>
> This changesets introduce a function to perform topological (one branch after
> the other) iteration over a set of changeset. This first version have a lot of
> limitation but the approach should be flexible enough to allow all kind of
> improvement in the future. This changeset aims at setting the first stone more
> than providing a final solution.
>
> The algorithm works does not needs to know the whole set of nodes involved
> before emitting revision. So make it a good candidate for usage in place like
> `hg log` or graphical tools that need a fast first result time.
>
> diff --git a/mercurial/graphmod.py b/mercurial/graphmod.py
> --- a/mercurial/graphmod.py
> +++ b/mercurial/graphmod.py
> @@ -20,10 +20,160 @@ Data depends on type.
>  from mercurial.node import nullrev
>  import util
>
>  CHANGESET = 'C'
>
> +def topoiter(revs, parentsfunc):
> +    """topologically iter over a set of revision, one branch at a time.
> +
> +    Currently does not handle non contigious <revs> input.
> +
> +    Currently consider every changeset under a merge to be on the same branch
> +    using revision number to sort them.
> +
> +    Could be easily extend to give priority to an initial branch.
> +
> +    The revision are emitted heads to roots."""
> +    ### Quick summary of the algorithm
> +    #
> +    # This function is based around a "retention" principle. We keep revision
> +    # in memory until we are ready to emit a whole branch that immediately
> +    # "merge" into an existing one. This reduce the number of branch "ongoing"
> +    # at the same time.
> +    #
> +    # During iteration revs are split into two groups:
> +    # A) revision already emitted
> +    # B) revision in "retention". They are stored as different subgroup.
> +    #
> +    # for each REV, we do the follow logic:
> +    #
> +    #   if REV is a parent of (A), we will emit it. But before emitting it,
> +    #   we'll "free" all the revs from subgroup in (B) that were waiting for
> +    #   REV to be available.
> +    #
> +    #   else, we'll search for a subgroup if (B) awaiting for this revision to
> +    #   be available, if such group exist, we add REV to it and the subgroup is
> +    #   now awaiting for REV.parents() to be available.
> +    #
> +    #   finally if no such group existed in (B), we create a new subgroup.
> +    #
> +    #
> +    # To bootstrap the algorithm, we display the first revision we saw.
> +
> +    revs.sort(reverse=True)
> +
> +    # set of parents of revision that have been yield. They can be considered
> +    # unblocked as the graph generator is already aware of them so there is no
> +    # need to delay the one that reference them.
> +    unblocked = set()
> +
> +    # list of group waiting to be displayed, each group is defined by:
> +    #
> +    #   (revs:    lists of revs waiting to be displayed,
> +    #    blocked: set of that cannot be displayed before those in the sets)
> +    #
> +    # The second value  correspond to parents of any revision in the group
> +    # that is not itself contained in the group. The main idea of this
> +    # algorithm is to delay as much as possible the emission of any revision.
> +    # This means waiting for the moment we are about to display theses
> +    # parents to display the revs in a group.
> +    #
> +    # This first implementation is smart until it meet a merge: it will
> +    # emit revs as soon as any parents is about to be emitted and can
> +    # grow an arbitrary number of revs in `blocked`. In practice this mean
> +    # we properly retains new branches but does not any special ordering
> +    # for ancestors of merges. The implementation can be improved to handle
> +    # this better.
> +    #
> +    # the first group is a special group. It correspond to all the revision
> +    # that were already emitted. the <revs> lists is expected to be empty
> +    # and the <blocked> set contains the parents revisions of already emitted
> +    # revision.
> +    #
> +    # You could pre-seed the <parents> set of groups[0] to a specific
> +    # changesets to select what the first emitted branch should be.
> +    #
> +    # We do not support revisions will hole yet, but adding such support
> +    # would be easy. The iteration will have to be done using both input
> +    # revision and parents (see cl.ancestors function + a few tweaks) but
> +    # only revisions parts of the initial set should be emitted.
> +    groups = [([], unblocked)]
> +    for current in revs:
> +        # Look for a group awaiting on the current revision.
> +        matching = [i for i, g in enumerate(groups) if current in g[1]]
> +
> +        if matching:
> +            # The main idea is to gather together all set that await on the
> +            # same revision.
> +            #
> +            # this merging is done at the time we are about to add this common
> +            # awaited to the group for simplicity purpose. Such merge could
> +            # happen sooner when we update the "blocked" set of revision.
> +            #
> +            # We also always keep the oldest group first. We can
> +            # probably improve the behavior by having the longuest set
> +            # first. That way, graph algorythms could minimise the
> +            # length of parallele lines their draw. This is currently
> +            # not done.
> +            targetidx = matching.pop(0)
> +            trevs, tparents = groups[targetidx]
> +            for i in matching:
> +                gr = groups[i]
> +                trevs.extend(gr[0])
> +                tparents |= gr[1]
> +            # delete all merged groups (but the one we keep)
> +            # (starting from the last group for performance and sanity reason)
> +            for i in reversed(matching):
> +                del groups[i]
> +        else:
> +            # his is a new head we create a new group for it.
> +            targetidx = len(groups)
> +            groups.append(([], set([current])))
> +
> +        gr = groups[targetidx]
> +
> +        # We now adds the current nodes to this groups. This is done after
> +        # the group merging because all elements from a group that relied on
> +        # this rev must preceed it.
> +        #
> +        # we also update the <parents> set to includes the parents on the
> +        # new nodes.
> +        gr[0].append(current)
> +        gr[1].remove(current)
> +        gr[1].update([p for p in parentsfunc(current) if p > nullrev])
> +
> +        # Look for a group to display
> +        #
> +        # When unblocked is empty (if clause), We are not waiting over any
> +        # revision during the first iteration (if no priority was given) or if
> +        # we outputed a whole disconnected sets of the graph (reached a root).
> +        # In that case we arbitrarily takes the oldest known group. The
> +        # heuristique could probably be better.
> +        #
> +        # Otherwise (elif clause) this mean we have some emitted revision.
> +        # if the group awaits on the same revision that the outputed ones,
> +        # we can safely output it.
> +        if not unblocked:
> +            if len(groups) > 1:  # display other subset
> +                targetidx = 1
> +                gr = groups[1]
> +        elif not gr[1] & unblocked:
> +            gr = None
> +
> +        if gr is not None:
> +            # update the set of awaited revisions with the one from the group
> +            unblocked |= gr[1]
> +            # output all revisions in the group
> +            for r in gr[0]:
> +                yield r
> +            # deleted the group that you just outputed.
> +            # unless it is group[0] in which case you just empty it.
> +            if targetidx:
> +                del groups[targetidx]
> +            else:
> +                gr[0][:] = []
> +
>  def dagwalker(repo, revs):
>      """cset DAG generator yielding (id, CHANGESET, ctx, [parentids]) tuples
>
>      This generator function walks through revisions (which should be ordered
>      from bigger to lower). It returns a tuple for each node. The node and parent
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
Pierre-Yves David - Dec. 1, 2014, 9:53 p.m.
On 12/01/2014 01:00 PM, Augie Fackler wrote:
> On Tue, Nov 18, 2014 at 11:56:07PM +0000, Pierre-Yves David wrote:
>> # HG changeset patch
>> # User Pierre-Yves David <pierre-yves.david@fb.com>
>> # Date 1409847572 -7200
>> #      Thu Sep 04 18:19:32 2014 +0200
>> # Node ID 06767ea27aca31cd1e65ccd32c22c45c1ebab83d
>> # Parent  e63941631a3f61b3323dbcc2545689b1eb34e308
>> graphmod: add a function for topological iteration
>
> Pierre-Yves and I went over this verbally using a voice chat, and he's
> going to send a revised version that renames the function.

This function was named "topo" because Durham, Matt and I discussed this 
as topo sorting and we were thinking of a 'sort(all(),"topo")' revset. 
But Augies pointed out that it was a bit too generic and confusing as we 
already refere to topological order when iterating in a descending way.

The new name we are heading now is `groupbranchiter`.
Pierre-Yves David - Dec. 2, 2014, 5:29 p.m.
On 12/01/2014 01:00 PM, Augie Fackler wrote:
> On Tue, Nov 18, 2014 at 11:56:07PM +0000, Pierre-Yves David wrote:
>> # HG changeset patch
>> # User Pierre-Yves David <pierre-yves.david@fb.com>
>> # Date 1409847572 -7200
>> #      Thu Sep 04 18:19:32 2014 +0200
>> # Node ID 06767ea27aca31cd1e65ccd32c22c45c1ebab83d
>> # Parent  e63941631a3f61b3323dbcc2545689b1eb34e308
>> graphmod: add a function for topological iteration
>
> Pierre-Yves and I went over this verbally using a voice chat, and he's
> going to send a revised version that renames the function.
>
> In the name of reviewer and author sanity, he's going to include a
> couple of follow-up patches that address some comment cleanup and
> makes groups[0] not special in the algorithm.

So After reading the algorithm again, I realized that having the first 
group exist for useful reason (mostly, ensuring other subgroup are 
properly merged in it) and cannot easily be removed (if it is even 
wished). Resending with doc update (doc update in initial patches thanks 
to evolve-fu.

Patch

diff --git a/mercurial/graphmod.py b/mercurial/graphmod.py
--- a/mercurial/graphmod.py
+++ b/mercurial/graphmod.py
@@ -20,10 +20,160 @@  Data depends on type.
 from mercurial.node import nullrev
 import util
 
 CHANGESET = 'C'
 
+def topoiter(revs, parentsfunc):
+    """topologically iter over a set of revision, one branch at a time.
+
+    Currently does not handle non contigious <revs> input.
+
+    Currently consider every changeset under a merge to be on the same branch
+    using revision number to sort them.
+
+    Could be easily extend to give priority to an initial branch.
+
+    The revision are emitted heads to roots."""
+    ### Quick summary of the algorithm
+    #
+    # This function is based around a "retention" principle. We keep revision
+    # in memory until we are ready to emit a whole branch that immediately
+    # "merge" into an existing one. This reduce the number of branch "ongoing"
+    # at the same time.
+    #
+    # During iteration revs are split into two groups:
+    # A) revision already emitted
+    # B) revision in "retention". They are stored as different subgroup.
+    #
+    # for each REV, we do the follow logic:
+    #
+    #   if REV is a parent of (A), we will emit it. But before emitting it,
+    #   we'll "free" all the revs from subgroup in (B) that were waiting for
+    #   REV to be available.
+    #
+    #   else, we'll search for a subgroup if (B) awaiting for this revision to
+    #   be available, if such group exist, we add REV to it and the subgroup is
+    #   now awaiting for REV.parents() to be available.
+    #
+    #   finally if no such group existed in (B), we create a new subgroup.
+    #
+    #
+    # To bootstrap the algorithm, we display the first revision we saw.
+
+    revs.sort(reverse=True)
+
+    # set of parents of revision that have been yield. They can be considered
+    # unblocked as the graph generator is already aware of them so there is no
+    # need to delay the one that reference them.
+    unblocked = set()
+
+    # list of group waiting to be displayed, each group is defined by:
+    #
+    #   (revs:    lists of revs waiting to be displayed,
+    #    blocked: set of that cannot be displayed before those in the sets)
+    #
+    # The second value  correspond to parents of any revision in the group
+    # that is not itself contained in the group. The main idea of this
+    # algorithm is to delay as much as possible the emission of any revision.
+    # This means waiting for the moment we are about to display theses
+    # parents to display the revs in a group.
+    #
+    # This first implementation is smart until it meet a merge: it will
+    # emit revs as soon as any parents is about to be emitted and can
+    # grow an arbitrary number of revs in `blocked`. In practice this mean
+    # we properly retains new branches but does not any special ordering
+    # for ancestors of merges. The implementation can be improved to handle
+    # this better.
+    #
+    # the first group is a special group. It correspond to all the revision
+    # that were already emitted. the <revs> lists is expected to be empty
+    # and the <blocked> set contains the parents revisions of already emitted
+    # revision.
+    #
+    # You could pre-seed the <parents> set of groups[0] to a specific
+    # changesets to select what the first emitted branch should be.
+    #
+    # We do not support revisions will hole yet, but adding such support
+    # would be easy. The iteration will have to be done using both input
+    # revision and parents (see cl.ancestors function + a few tweaks) but
+    # only revisions parts of the initial set should be emitted.
+    groups = [([], unblocked)]
+    for current in revs:
+        # Look for a group awaiting on the current revision.
+        matching = [i for i, g in enumerate(groups) if current in g[1]]
+
+        if matching:
+            # The main idea is to gather together all set that await on the
+            # same revision.
+            #
+            # this merging is done at the time we are about to add this common
+            # awaited to the group for simplicity purpose. Such merge could
+            # happen sooner when we update the "blocked" set of revision.
+            #
+            # We also always keep the oldest group first. We can
+            # probably improve the behavior by having the longuest set
+            # first. That way, graph algorythms could minimise the
+            # length of parallele lines their draw. This is currently
+            # not done.
+            targetidx = matching.pop(0)
+            trevs, tparents = groups[targetidx]
+            for i in matching:
+                gr = groups[i]
+                trevs.extend(gr[0])
+                tparents |= gr[1]
+            # delete all merged groups (but the one we keep)
+            # (starting from the last group for performance and sanity reason)
+            for i in reversed(matching):
+                del groups[i]
+        else:
+            # his is a new head we create a new group for it.
+            targetidx = len(groups)
+            groups.append(([], set([current])))
+
+        gr = groups[targetidx]
+
+        # We now adds the current nodes to this groups. This is done after
+        # the group merging because all elements from a group that relied on
+        # this rev must preceed it.
+        #
+        # we also update the <parents> set to includes the parents on the
+        # new nodes.
+        gr[0].append(current)
+        gr[1].remove(current)
+        gr[1].update([p for p in parentsfunc(current) if p > nullrev])
+
+        # Look for a group to display
+        #
+        # When unblocked is empty (if clause), We are not waiting over any
+        # revision during the first iteration (if no priority was given) or if
+        # we outputed a whole disconnected sets of the graph (reached a root).
+        # In that case we arbitrarily takes the oldest known group. The
+        # heuristique could probably be better.
+        #
+        # Otherwise (elif clause) this mean we have some emitted revision.
+        # if the group awaits on the same revision that the outputed ones,
+        # we can safely output it.
+        if not unblocked:
+            if len(groups) > 1:  # display other subset
+                targetidx = 1
+                gr = groups[1]
+        elif not gr[1] & unblocked:
+            gr = None
+
+        if gr is not None:
+            # update the set of awaited revisions with the one from the group
+            unblocked |= gr[1]
+            # output all revisions in the group
+            for r in gr[0]:
+                yield r
+            # deleted the group that you just outputed.
+            # unless it is group[0] in which case you just empty it.
+            if targetidx:
+                del groups[targetidx]
+            else:
+                gr[0][:] = []
+
 def dagwalker(repo, revs):
     """cset DAG generator yielding (id, CHANGESET, ctx, [parentids]) tuples
 
     This generator function walks through revisions (which should be ordered
     from bigger to lower). It returns a tuple for each node. The node and parent