Patchwork [1,of,6] changegroup: expose list of changesets from getsubsetraw

login
register
mail settings
Submitter Gregory Szorc
Date May 26, 2015, 12:42 a.m.
Message ID <ea28b7239085bbc7d561.1432600954@vm-ubuntu-main.gateway.sonic.net>
Download mbox | patch
Permalink /patch/9265/
State Changes Requested
Delegated to: Pierre-Yves David
Headers show

Comments

Gregory Szorc - May 26, 2015, 12:42 a.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1432589936 25200
#      Mon May 25 14:38:56 2015 -0700
# Node ID ea28b7239085bbc7d561ab6aa6f86bdc19f77a25
# Parent  605b1d32c1c011d56233f28923ee5354fce7e426
changegroup: expose list of changesets from getsubsetraw

Currently, when generating changegroups, it is possible for the caller
to not know exactly what data is inside without parsing the returned
data stream. This is inefficient and adds complexity.

Return the list of changesets from getsubsetraw to make this data
more widely available.
Gregory Szorc - May 26, 2015, 12:48 a.m.
I truncated this series at 6 patches to appease the patchbomb gods. The
crux of this work is adding transfer of hgtags fnodes cache data as a
bundle2 part.
https://hg.mozilla.org/users/gszorc_mozilla.com/hg/rev/94ac12a1fd57 and its
3 parents are ready for review. But I won't patchbomb them until this
series is queued.

Yes, I'll probably add branch map transfer as well. Maybe not today, though.

On Mon, May 25, 2015 at 5:42 PM, Gregory Szorc <gregory.szorc@gmail.com>
wrote:

> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1432589936 25200
> #      Mon May 25 14:38:56 2015 -0700
> # Node ID ea28b7239085bbc7d561ab6aa6f86bdc19f77a25
> # Parent  605b1d32c1c011d56233f28923ee5354fce7e426
> changegroup: expose list of changesets from getsubsetraw
>
> Currently, when generating changegroups, it is possible for the caller
> to not know exactly what data is inside without parsing the returned
> data stream. This is inefficient and adds complexity.
>
> Return the list of changesets from getsubsetraw to make this data
> more widely available.
>
> diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
> --- a/mercurial/changegroup.py
> +++ b/mercurial/changegroup.py
> @@ -549,8 +549,13 @@ def _changegroupinfo(repo, nodes, source
>          for node in nodes:
>              repo.ui.debug("%s\n" % hex(node))
>
>  def getsubsetraw(repo, outgoing, bundler, source, fastpath=False):
> +    """Obtain changegroup data.
> +
> +    Returns a list of binary changeset nodes and an iterator of raw
> +    changegroup data.
> +    """
>      repo = repo.unfiltered()
>      commonrevs = outgoing.common
>      csets = outgoing.missing
>      heads = outgoing.missingheads
> @@ -562,12 +567,12 @@ def getsubsetraw(repo, outgoing, bundler
>              repo.filtername is None and heads == sorted(repo.heads()))
>
>      repo.hook('preoutgoing', throw=True, source=source)
>      _changegroupinfo(repo, csets, source)
> -    return bundler.generate(commonrevs, csets, fastpathlinkrev, source)
> +    return csets, bundler.generate(commonrevs, csets, fastpathlinkrev,
> source)
>
>  def getsubset(repo, outgoing, bundler, source, fastpath=False,
> version='01'):
> -    gengroup = getsubsetraw(repo, outgoing, bundler, source, fastpath)
> +    csets, gengroup = getsubsetraw(repo, outgoing, bundler, source,
> fastpath)
>      return packermap[version][1](util.chunkbuffer(gengroup), 'UN')
>
>  def changegroupsubset(repo, roots, heads, source, version='01'):
>      """Compute a changegroup consisting of all the nodes that are
> @@ -602,9 +607,9 @@ def getlocalchangegroupraw(repo, source,
>      precomputed sets in outgoing. Returns a raw changegroup generator."""
>      if not outgoing.missing:
>          return None
>      bundler = packermap[version][0](repo, bundlecaps)
> -    return getsubsetraw(repo, outgoing, bundler, source)
> +    return getsubsetraw(repo, outgoing, bundler, source)[1]
>
>  def getlocalchangegroup(repo, source, outgoing, bundlecaps=None):
>      """Like getbundle, but taking a discovery.outgoing as an argument.
>
>
Pierre-Yves David - May 26, 2015, 4:32 a.m.
On 05/25/2015 05:42 PM, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1432589936 25200
> #      Mon May 25 14:38:56 2015 -0700
> # Node ID ea28b7239085bbc7d561ab6aa6f86bdc19f77a25
> # Parent  605b1d32c1c011d56233f28923ee5354fce7e426
> changegroup: expose list of changesets from getsubsetraw
>
> Currently, when generating changegroups, it is possible for the caller
> to not know exactly what data is inside without parsing the returned
> data stream. This is inefficient and adds complexity.
>
> Return the list of changesets from getsubsetraw to make this data
> more widely available.

I'm confused. This very data is -provided- to the function. It is in 
`outgoing.missing`. Why would we need to return it?
Gregory Szorc - May 26, 2015, 4:53 a.m.
On Mon, May 25, 2015 at 9:32 PM, Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 05/25/2015 05:42 PM, Gregory Szorc wrote:
>
>> # HG changeset patch
>> # User Gregory Szorc <gregory.szorc@gmail.com>
>> # Date 1432589936 25200
>> #      Mon May 25 14:38:56 2015 -0700
>> # Node ID ea28b7239085bbc7d561ab6aa6f86bdc19f77a25
>> # Parent  605b1d32c1c011d56233f28923ee5354fce7e426
>> changegroup: expose list of changesets from getsubsetraw
>>
>> Currently, when generating changegroups, it is possible for the caller
>> to not know exactly what data is inside without parsing the returned
>> data stream. This is inefficient and adds complexity.
>>
>> Return the list of changesets from getsubsetraw to make this data
>> more widely available.
>>
>
> I'm confused. This very data is -provided- to the function. It is in
> `outgoing.missing`. Why would we need to return it?
>

Err, I may have gotten too carried away with looking at return values and a
handful of functions that were named very similarly. I'll send a v2.
Pierre-Yves David - May 26, 2015, 4:59 a.m.
On 05/25/2015 09:53 PM, Gregory Szorc wrote:
> On Mon, May 25, 2015 at 9:32 PM, Pierre-Yves David
> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
> wrote:
>
>
>
>     On 05/25/2015 05:42 PM, Gregory Szorc wrote:
>
>         # HG changeset patch
>         # User Gregory Szorc <gregory.szorc@gmail.com
>         <mailto:gregory.szorc@gmail.com>>
>         # Date 1432589936 25200
>         #      Mon May 25 14:38:56 2015 -0700
>         # Node ID ea28b7239085bbc7d561ab6aa6f86bdc19f77a25
>         # Parent  605b1d32c1c011d56233f28923ee5354fce7e426
>         changegroup: expose list of changesets from getsubsetraw
>
>         Currently, when generating changegroups, it is possible for the
>         caller
>         to not know exactly what data is inside without parsing the returned
>         data stream. This is inefficient and adds complexity.
>
>         Return the list of changesets from getsubsetraw to make this data
>         more widely available.
>
>
>     I'm confused. This very data is -provided- to the function. It is in
>     `outgoing.missing`. Why would we need to return it?
>
>
> Err, I may have gotten too carried away with looking at return values
> and a handful of functions that were named very similarly. I'll send a v2.

What function remains that are not taking outgoing. In the bundle2 
context I expect about all of them to be in the modern form and take an 
outgoing object.

If it still make sense to modify some of them (And I'm not use it does , 
since old function we mostly left in place for old caller) you probably 
want to return the outgoing object that get computed internally.
Gregory Szorc - May 26, 2015, 5:07 a.m.
On Mon, May 25, 2015 at 9:59 PM, Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 05/25/2015 09:53 PM, Gregory Szorc wrote:
>
>> On Mon, May 25, 2015 at 9:32 PM, Pierre-Yves David
>> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
>> wrote:
>>
>>
>>
>>     On 05/25/2015 05:42 PM, Gregory Szorc wrote:
>>
>>         # HG changeset patch
>>         # User Gregory Szorc <gregory.szorc@gmail.com
>>         <mailto:gregory.szorc@gmail.com>>
>>         # Date 1432589936 25200
>>         #      Mon May 25 14:38:56 2015 -0700
>>         # Node ID ea28b7239085bbc7d561ab6aa6f86bdc19f77a25
>>         # Parent  605b1d32c1c011d56233f28923ee5354fce7e426
>>         changegroup: expose list of changesets from getsubsetraw
>>
>>         Currently, when generating changegroups, it is possible for the
>>         caller
>>         to not know exactly what data is inside without parsing the
>> returned
>>         data stream. This is inefficient and adds complexity.
>>
>>         Return the list of changesets from getsubsetraw to make this data
>>         more widely available.
>>
>>
>>     I'm confused. This very data is -provided- to the function. It is in
>>     `outgoing.missing`. Why would we need to return it?
>>
>>
>> Err, I may have gotten too carried away with looking at return values
>> and a handful of functions that were named very similarly. I'll send a v2.
>>
>
> What function remains that are not taking outgoing. In the bundle2 context
> I expect about all of them to be in the modern form and take an outgoing
> object.
>
> If it still make sense to modify some of them (And I'm not use it does ,
> since old function we mostly left in place for old caller) you probably
> want to return the outgoing object that get computed internally.
>

As I'm looking at this again, I think I was somewhat justified.
getchangegroup raw calculates the discovery.outgoing instance and passes it
into getlocalchangegroupraw. It then gets passed into getsubsetraw, where
.missing is passed into bundler.bundle(). I think it is a layering
violation for getchangegroupraw (and its non-raw sibling) to assume
getsubsetraw uses outgoing.missing for changegroup data. getsubsetraw is
the thing passing a list of changesets into the bundle function, so I think
getsubsetraw should return info about what is in the changegroup. As crazy
as it sounds, we do have to change 6 functions to hook all this plumbing up
in a consistent way.
Pierre-Yves David - May 26, 2015, 5:36 a.m.
On 05/25/2015 10:07 PM, Gregory Szorc wrote:
> On Mon, May 25, 2015 at 9:59 PM, Pierre-Yves David
> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
> wrote:
>
>
>
>     On 05/25/2015 09:53 PM, Gregory Szorc wrote:
>
>         On Mon, May 25, 2015 at 9:32 PM, Pierre-Yves David
>         <pierre-yves.david@ens-lyon.org
>         <mailto:pierre-yves.david@ens-lyon.org>
>         <mailto:pierre-yves.david@ens-lyon.org
>         <mailto:pierre-yves.david@ens-lyon.org>>>
>         wrote:
>
>
>
>              On 05/25/2015 05:42 PM, Gregory Szorc wrote:
>
>                  # HG changeset patch
>                  # User Gregory Szorc <gregory.szorc@gmail.com
>         <mailto:gregory.szorc@gmail.com>
>                  <mailto:gregory.szorc@gmail.com
>         <mailto:gregory.szorc@gmail.com>>>
>                  # Date 1432589936 25200
>                  #      Mon May 25 14:38:56 2015 -0700
>                  # Node ID ea28b7239085bbc7d561ab6aa6f86bdc19f77a25
>                  # Parent  605b1d32c1c011d56233f28923ee5354fce7e426
>                  changegroup: expose list of changesets from getsubsetraw
>
>                  Currently, when generating changegroups, it is possible
>         for the
>                  caller
>                  to not know exactly what data is inside without parsing
>         the returned
>                  data stream. This is inefficient and adds complexity.
>
>                  Return the list of changesets from getsubsetraw to make
>         this data
>                  more widely available.
>
>
>              I'm confused. This very data is -provided- to the function.
>         It is in
>              `outgoing.missing`. Why would we need to return it?
>
>
>         Err, I may have gotten too carried away with looking at return
>         values
>         and a handful of functions that were named very similarly. I'll
>         send a v2.
>
>
>     What function remains that are not taking outgoing. In the bundle2
>     context I expect about all of them to be in the modern form and take
>     an outgoing object.
>
>     If it still make sense to modify some of them (And I'm not use it
>     does , since old function we mostly left in place for old caller)
>     you probably want to return the outgoing object that get computed
>     internally.
>
>
> As I'm looking at this again, I think I was somewhat justified.
> getchangegroup raw calculates the discovery.outgoing instance and passes
> it into getlocalchangegroupraw. It then gets passed into getsubsetraw,
> where .missing is passed into bundler.bundle(). I think it is a layering
> violation for getchangegroupraw (and its non-raw sibling) to assume
> getsubsetraw uses outgoing.missing for changegroup data. getsubsetraw is
> the thing passing a list of changesets into the bundle function, so I
> think getsubsetraw should return info about what is in the changegroup.
> As crazy as it sounds, we do have to change 6 functions to hook all this
> plumbing up in a consistent way.

The "outgoing" object represent a set changesets we want in a bundle and 
related "base" we can rely on. It feel like appropriate to pass it all 
along the line.
Augie Fackler - May 26, 2015, 2:32 p.m.
On Mon, May 25, 2015 at 10:36:54PM -0700, Pierre-Yves David wrote:
>
>
> On 05/25/2015 10:07 PM, Gregory Szorc wrote:
> >On Mon, May 25, 2015 at 9:59 PM, Pierre-Yves David
> ><pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
> >wrote:
> >
> >
> >
> >    On 05/25/2015 09:53 PM, Gregory Szorc wrote:
> >
> >        On Mon, May 25, 2015 at 9:32 PM, Pierre-Yves David
> >        <pierre-yves.david@ens-lyon.org
> >        <mailto:pierre-yves.david@ens-lyon.org>
> >        <mailto:pierre-yves.david@ens-lyon.org
> >        <mailto:pierre-yves.david@ens-lyon.org>>>
> >        wrote:
> >
> >
> >
> >             On 05/25/2015 05:42 PM, Gregory Szorc wrote:
> >
> >                 # HG changeset patch
> >                 # User Gregory Szorc <gregory.szorc@gmail.com
> >        <mailto:gregory.szorc@gmail.com>
> >                 <mailto:gregory.szorc@gmail.com
> >        <mailto:gregory.szorc@gmail.com>>>
> >                 # Date 1432589936 25200
> >                 #      Mon May 25 14:38:56 2015 -0700
> >                 # Node ID ea28b7239085bbc7d561ab6aa6f86bdc19f77a25
> >                 # Parent  605b1d32c1c011d56233f28923ee5354fce7e426
> >                 changegroup: expose list of changesets from getsubsetraw
> >
> >                 Currently, when generating changegroups, it is possible
> >        for the
> >                 caller
> >                 to not know exactly what data is inside without parsing
> >        the returned
> >                 data stream. This is inefficient and adds complexity.
> >
> >                 Return the list of changesets from getsubsetraw to make
> >        this data
> >                 more widely available.
> >
> >
> >             I'm confused. This very data is -provided- to the function.
> >        It is in
> >             `outgoing.missing`. Why would we need to return it?
> >
> >
> >        Err, I may have gotten too carried away with looking at return
> >        values
> >        and a handful of functions that were named very similarly. I'll
> >        send a v2.
> >
> >
> >    What function remains that are not taking outgoing. In the bundle2
> >    context I expect about all of them to be in the modern form and take
> >    an outgoing object.
> >
> >    If it still make sense to modify some of them (And I'm not use it
> >    does , since old function we mostly left in place for old caller)
> >    you probably want to return the outgoing object that get computed
> >    internally.
> >
> >
> >As I'm looking at this again, I think I was somewhat justified.
> >getchangegroup raw calculates the discovery.outgoing instance and passes
> >it into getlocalchangegroupraw. It then gets passed into getsubsetraw,
> >where .missing is passed into bundler.bundle(). I think it is a layering
> >violation for getchangegroupraw (and its non-raw sibling) to assume
> >getsubsetraw uses outgoing.missing for changegroup data. getsubsetraw is
> >the thing passing a list of changesets into the bundle function, so I
> >think getsubsetraw should return info about what is in the changegroup.
> >As crazy as it sounds, we do have to change 6 functions to hook all this
> >plumbing up in a consistent way.
>
> The "outgoing" object represent a set changesets we want in a bundle and
> related "base" we can rely on. It feel like appropriate to pass it all along
> the line.

I'm expecting a v2 of this, right?

>
> --
> Pierre-Yves David
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
Gregory Szorc - May 26, 2015, 4:05 p.m.
On Tue, May 26, 2015 at 7:32 AM, Augie Fackler <raf@durin42.com> wrote:

> On Mon, May 25, 2015 at 10:36:54PM -0700, Pierre-Yves David wrote:
> >
> >
> > On 05/25/2015 10:07 PM, Gregory Szorc wrote:
> > >On Mon, May 25, 2015 at 9:59 PM, Pierre-Yves David
> > ><pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org
> >>
> > >wrote:
> > >
> > >
> > >
> > >    On 05/25/2015 09:53 PM, Gregory Szorc wrote:
> > >
> > >        On Mon, May 25, 2015 at 9:32 PM, Pierre-Yves David
> > >        <pierre-yves.david@ens-lyon.org
> > >        <mailto:pierre-yves.david@ens-lyon.org>
> > >        <mailto:pierre-yves.david@ens-lyon.org
> > >        <mailto:pierre-yves.david@ens-lyon.org>>>
> > >        wrote:
> > >
> > >
> > >
> > >             On 05/25/2015 05:42 PM, Gregory Szorc wrote:
> > >
> > >                 # HG changeset patch
> > >                 # User Gregory Szorc <gregory.szorc@gmail.com
> > >        <mailto:gregory.szorc@gmail.com>
> > >                 <mailto:gregory.szorc@gmail.com
> > >        <mailto:gregory.szorc@gmail.com>>>
> > >                 # Date 1432589936 25200
> > >                 #      Mon May 25 14:38:56 2015 -0700
> > >                 # Node ID ea28b7239085bbc7d561ab6aa6f86bdc19f77a25
> > >                 # Parent  605b1d32c1c011d56233f28923ee5354fce7e426
> > >                 changegroup: expose list of changesets from
> getsubsetraw
> > >
> > >                 Currently, when generating changegroups, it is possible
> > >        for the
> > >                 caller
> > >                 to not know exactly what data is inside without parsing
> > >        the returned
> > >                 data stream. This is inefficient and adds complexity.
> > >
> > >                 Return the list of changesets from getsubsetraw to make
> > >        this data
> > >                 more widely available.
> > >
> > >
> > >             I'm confused. This very data is -provided- to the function.
> > >        It is in
> > >             `outgoing.missing`. Why would we need to return it?
> > >
> > >
> > >        Err, I may have gotten too carried away with looking at return
> > >        values
> > >        and a handful of functions that were named very similarly. I'll
> > >        send a v2.
> > >
> > >
> > >    What function remains that are not taking outgoing. In the bundle2
> > >    context I expect about all of them to be in the modern form and take
> > >    an outgoing object.
> > >
> > >    If it still make sense to modify some of them (And I'm not use it
> > >    does , since old function we mostly left in place for old caller)
> > >    you probably want to return the outgoing object that get computed
> > >    internally.
> > >
> > >
> > >As I'm looking at this again, I think I was somewhat justified.
> > >getchangegroup raw calculates the discovery.outgoing instance and passes
> > >it into getlocalchangegroupraw. It then gets passed into getsubsetraw,
> > >where .missing is passed into bundler.bundle(). I think it is a layering
> > >violation for getchangegroupraw (and its non-raw sibling) to assume
> > >getsubsetraw uses outgoing.missing for changegroup data. getsubsetraw is
> > >the thing passing a list of changesets into the bundle function, so I
> > >think getsubsetraw should return info about what is in the changegroup.
> > >As crazy as it sounds, we do have to change 6 functions to hook all this
> > >plumbing up in a consistent way.
> >
> > The "outgoing" object represent a set changesets we want in a bundle and
> > related "base" we can rely on. It feel like appropriate to pass it all
> along
> > the line.
>
> I'm expecting a v2 of this, right?
>

Yes. Pierre-Yves requested I switch this to return the discovery.outgoing
instance and I think that makes sense.
Pierre-Yves David - May 26, 2015, 5:50 p.m.
On 05/26/2015 07:32 AM, Augie Fackler wrote:
> On Mon, May 25, 2015 at 10:36:54PM -0700, Pierre-Yves David wrote:
>>
>>
>> On 05/25/2015 10:07 PM, Gregory Szorc wrote:
>>> On Mon, May 25, 2015 at 9:59 PM, Pierre-Yves David
>>> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
>>> wrote:
>>>
>>>
>>>
>>>     On 05/25/2015 09:53 PM, Gregory Szorc wrote:
>>>
>>>         On Mon, May 25, 2015 at 9:32 PM, Pierre-Yves David
>>>         <pierre-yves.david@ens-lyon.org
>>>         <mailto:pierre-yves.david@ens-lyon.org>
>>>         <mailto:pierre-yves.david@ens-lyon.org
>>>         <mailto:pierre-yves.david@ens-lyon.org>>>
>>>         wrote:
>>>
>>>
>>>
>>>              On 05/25/2015 05:42 PM, Gregory Szorc wrote:
>>>
>>>                  # HG changeset patch
>>>                  # User Gregory Szorc <gregory.szorc@gmail.com
>>>         <mailto:gregory.szorc@gmail.com>
>>>                  <mailto:gregory.szorc@gmail.com
>>>         <mailto:gregory.szorc@gmail.com>>>
>>>                  # Date 1432589936 25200
>>>                  #      Mon May 25 14:38:56 2015 -0700
>>>                  # Node ID ea28b7239085bbc7d561ab6aa6f86bdc19f77a25
>>>                  # Parent  605b1d32c1c011d56233f28923ee5354fce7e426
>>>                  changegroup: expose list of changesets from getsubsetraw
>>>
>>>                  Currently, when generating changegroups, it is possible
>>>         for the
>>>                  caller
>>>                  to not know exactly what data is inside without parsing
>>>         the returned
>>>                  data stream. This is inefficient and adds complexity.
>>>
>>>                  Return the list of changesets from getsubsetraw to make
>>>         this data
>>>                  more widely available.
>>>
>>>
>>>              I'm confused. This very data is -provided- to the function.
>>>         It is in
>>>              `outgoing.missing`. Why would we need to return it?
>>>
>>>
>>>         Err, I may have gotten too carried away with looking at return
>>>         values
>>>         and a handful of functions that were named very similarly. I'll
>>>         send a v2.
>>>
>>>
>>>     What function remains that are not taking outgoing. In the bundle2
>>>     context I expect about all of them to be in the modern form and take
>>>     an outgoing object.
>>>
>>>     If it still make sense to modify some of them (And I'm not use it
>>>     does , since old function we mostly left in place for old caller)
>>>     you probably want to return the outgoing object that get computed
>>>     internally.
>>>
>>>
>>> As I'm looking at this again, I think I was somewhat justified.
>>> getchangegroup raw calculates the discovery.outgoing instance and passes
>>> it into getlocalchangegroupraw. It then gets passed into getsubsetraw,
>>> where .missing is passed into bundler.bundle(). I think it is a layering
>>> violation for getchangegroupraw (and its non-raw sibling) to assume
>>> getsubsetraw uses outgoing.missing for changegroup data. getsubsetraw is
>>> the thing passing a list of changesets into the bundle function, so I
>>> think getsubsetraw should return info about what is in the changegroup.
>>> As crazy as it sounds, we do have to change 6 functions to hook all this
>>> plumbing up in a consistent way.
>>
>> The "outgoing" object represent a set changesets we want in a bundle and
>> related "base" we can rely on. It feel like appropriate to pass it all along
>> the line.
>
> I'm expecting a v2 of this, right?

Yes.

This logic have been written by me under Matt supervision, so I'm 
expecting Greg to prove to me it is wrong if he wants to over-write it.

On the other and this is one of the very first refactoring I did in late 
2011 so I won't be surprise if it is wrong :)

Patch

diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
--- a/mercurial/changegroup.py
+++ b/mercurial/changegroup.py
@@ -549,8 +549,13 @@  def _changegroupinfo(repo, nodes, source
         for node in nodes:
             repo.ui.debug("%s\n" % hex(node))
 
 def getsubsetraw(repo, outgoing, bundler, source, fastpath=False):
+    """Obtain changegroup data.
+
+    Returns a list of binary changeset nodes and an iterator of raw
+    changegroup data.
+    """
     repo = repo.unfiltered()
     commonrevs = outgoing.common
     csets = outgoing.missing
     heads = outgoing.missingheads
@@ -562,12 +567,12 @@  def getsubsetraw(repo, outgoing, bundler
             repo.filtername is None and heads == sorted(repo.heads()))
 
     repo.hook('preoutgoing', throw=True, source=source)
     _changegroupinfo(repo, csets, source)
-    return bundler.generate(commonrevs, csets, fastpathlinkrev, source)
+    return csets, bundler.generate(commonrevs, csets, fastpathlinkrev, source)
 
 def getsubset(repo, outgoing, bundler, source, fastpath=False, version='01'):
-    gengroup = getsubsetraw(repo, outgoing, bundler, source, fastpath)
+    csets, gengroup = getsubsetraw(repo, outgoing, bundler, source, fastpath)
     return packermap[version][1](util.chunkbuffer(gengroup), 'UN')
 
 def changegroupsubset(repo, roots, heads, source, version='01'):
     """Compute a changegroup consisting of all the nodes that are
@@ -602,9 +607,9 @@  def getlocalchangegroupraw(repo, source,
     precomputed sets in outgoing. Returns a raw changegroup generator."""
     if not outgoing.missing:
         return None
     bundler = packermap[version][0](repo, bundlecaps)
-    return getsubsetraw(repo, outgoing, bundler, source)
+    return getsubsetraw(repo, outgoing, bundler, source)[1]
 
 def getlocalchangegroup(repo, source, outgoing, bundlecaps=None):
     """Like getbundle, but taking a discovery.outgoing as an argument.