Submitter | Gregory Szorc |
---|---|
Date | May 26, 2015, 12:42 a.m. |
Message ID | <ea28b7239085bbc7d561.1432600954@vm-ubuntu-main.gateway.sonic.net> |
Download | mbox | patch |
Permalink | /patch/9265/ |
State | Changes Requested |
Delegated to: | Pierre-Yves David |
Headers | show |
Comments
I truncated this series at 6 patches to appease the patchbomb gods. The crux of this work is adding transfer of hgtags fnodes cache data as a bundle2 part. https://hg.mozilla.org/users/gszorc_mozilla.com/hg/rev/94ac12a1fd57 and its 3 parents are ready for review. But I won't patchbomb them until this series is queued. Yes, I'll probably add branch map transfer as well. Maybe not today, though. On Mon, May 25, 2015 at 5:42 PM, Gregory Szorc <gregory.szorc@gmail.com> wrote: > # HG changeset patch > # User Gregory Szorc <gregory.szorc@gmail.com> > # Date 1432589936 25200 > # Mon May 25 14:38:56 2015 -0700 > # Node ID ea28b7239085bbc7d561ab6aa6f86bdc19f77a25 > # Parent 605b1d32c1c011d56233f28923ee5354fce7e426 > changegroup: expose list of changesets from getsubsetraw > > Currently, when generating changegroups, it is possible for the caller > to not know exactly what data is inside without parsing the returned > data stream. This is inefficient and adds complexity. > > Return the list of changesets from getsubsetraw to make this data > more widely available. > > diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py > --- a/mercurial/changegroup.py > +++ b/mercurial/changegroup.py > @@ -549,8 +549,13 @@ def _changegroupinfo(repo, nodes, source > for node in nodes: > repo.ui.debug("%s\n" % hex(node)) > > def getsubsetraw(repo, outgoing, bundler, source, fastpath=False): > + """Obtain changegroup data. > + > + Returns a list of binary changeset nodes and an iterator of raw > + changegroup data. > + """ > repo = repo.unfiltered() > commonrevs = outgoing.common > csets = outgoing.missing > heads = outgoing.missingheads > @@ -562,12 +567,12 @@ def getsubsetraw(repo, outgoing, bundler > repo.filtername is None and heads == sorted(repo.heads())) > > repo.hook('preoutgoing', throw=True, source=source) > _changegroupinfo(repo, csets, source) > - return bundler.generate(commonrevs, csets, fastpathlinkrev, source) > + return csets, bundler.generate(commonrevs, csets, fastpathlinkrev, > source) > > def getsubset(repo, outgoing, bundler, source, fastpath=False, > version='01'): > - gengroup = getsubsetraw(repo, outgoing, bundler, source, fastpath) > + csets, gengroup = getsubsetraw(repo, outgoing, bundler, source, > fastpath) > return packermap[version][1](util.chunkbuffer(gengroup), 'UN') > > def changegroupsubset(repo, roots, heads, source, version='01'): > """Compute a changegroup consisting of all the nodes that are > @@ -602,9 +607,9 @@ def getlocalchangegroupraw(repo, source, > precomputed sets in outgoing. Returns a raw changegroup generator.""" > if not outgoing.missing: > return None > bundler = packermap[version][0](repo, bundlecaps) > - return getsubsetraw(repo, outgoing, bundler, source) > + return getsubsetraw(repo, outgoing, bundler, source)[1] > > def getlocalchangegroup(repo, source, outgoing, bundlecaps=None): > """Like getbundle, but taking a discovery.outgoing as an argument. > >
On 05/25/2015 05:42 PM, Gregory Szorc wrote: > # HG changeset patch > # User Gregory Szorc <gregory.szorc@gmail.com> > # Date 1432589936 25200 > # Mon May 25 14:38:56 2015 -0700 > # Node ID ea28b7239085bbc7d561ab6aa6f86bdc19f77a25 > # Parent 605b1d32c1c011d56233f28923ee5354fce7e426 > changegroup: expose list of changesets from getsubsetraw > > Currently, when generating changegroups, it is possible for the caller > to not know exactly what data is inside without parsing the returned > data stream. This is inefficient and adds complexity. > > Return the list of changesets from getsubsetraw to make this data > more widely available. I'm confused. This very data is -provided- to the function. It is in `outgoing.missing`. Why would we need to return it?
On Mon, May 25, 2015 at 9:32 PM, Pierre-Yves David < pierre-yves.david@ens-lyon.org> wrote: > > > On 05/25/2015 05:42 PM, Gregory Szorc wrote: > >> # HG changeset patch >> # User Gregory Szorc <gregory.szorc@gmail.com> >> # Date 1432589936 25200 >> # Mon May 25 14:38:56 2015 -0700 >> # Node ID ea28b7239085bbc7d561ab6aa6f86bdc19f77a25 >> # Parent 605b1d32c1c011d56233f28923ee5354fce7e426 >> changegroup: expose list of changesets from getsubsetraw >> >> Currently, when generating changegroups, it is possible for the caller >> to not know exactly what data is inside without parsing the returned >> data stream. This is inefficient and adds complexity. >> >> Return the list of changesets from getsubsetraw to make this data >> more widely available. >> > > I'm confused. This very data is -provided- to the function. It is in > `outgoing.missing`. Why would we need to return it? > Err, I may have gotten too carried away with looking at return values and a handful of functions that were named very similarly. I'll send a v2.
On 05/25/2015 09:53 PM, Gregory Szorc wrote: > On Mon, May 25, 2015 at 9:32 PM, Pierre-Yves David > <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>> > wrote: > > > > On 05/25/2015 05:42 PM, Gregory Szorc wrote: > > # HG changeset patch > # User Gregory Szorc <gregory.szorc@gmail.com > <mailto:gregory.szorc@gmail.com>> > # Date 1432589936 25200 > # Mon May 25 14:38:56 2015 -0700 > # Node ID ea28b7239085bbc7d561ab6aa6f86bdc19f77a25 > # Parent 605b1d32c1c011d56233f28923ee5354fce7e426 > changegroup: expose list of changesets from getsubsetraw > > Currently, when generating changegroups, it is possible for the > caller > to not know exactly what data is inside without parsing the returned > data stream. This is inefficient and adds complexity. > > Return the list of changesets from getsubsetraw to make this data > more widely available. > > > I'm confused. This very data is -provided- to the function. It is in > `outgoing.missing`. Why would we need to return it? > > > Err, I may have gotten too carried away with looking at return values > and a handful of functions that were named very similarly. I'll send a v2. What function remains that are not taking outgoing. In the bundle2 context I expect about all of them to be in the modern form and take an outgoing object. If it still make sense to modify some of them (And I'm not use it does , since old function we mostly left in place for old caller) you probably want to return the outgoing object that get computed internally.
On Mon, May 25, 2015 at 9:59 PM, Pierre-Yves David < pierre-yves.david@ens-lyon.org> wrote: > > > On 05/25/2015 09:53 PM, Gregory Szorc wrote: > >> On Mon, May 25, 2015 at 9:32 PM, Pierre-Yves David >> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>> >> wrote: >> >> >> >> On 05/25/2015 05:42 PM, Gregory Szorc wrote: >> >> # HG changeset patch >> # User Gregory Szorc <gregory.szorc@gmail.com >> <mailto:gregory.szorc@gmail.com>> >> # Date 1432589936 25200 >> # Mon May 25 14:38:56 2015 -0700 >> # Node ID ea28b7239085bbc7d561ab6aa6f86bdc19f77a25 >> # Parent 605b1d32c1c011d56233f28923ee5354fce7e426 >> changegroup: expose list of changesets from getsubsetraw >> >> Currently, when generating changegroups, it is possible for the >> caller >> to not know exactly what data is inside without parsing the >> returned >> data stream. This is inefficient and adds complexity. >> >> Return the list of changesets from getsubsetraw to make this data >> more widely available. >> >> >> I'm confused. This very data is -provided- to the function. It is in >> `outgoing.missing`. Why would we need to return it? >> >> >> Err, I may have gotten too carried away with looking at return values >> and a handful of functions that were named very similarly. I'll send a v2. >> > > What function remains that are not taking outgoing. In the bundle2 context > I expect about all of them to be in the modern form and take an outgoing > object. > > If it still make sense to modify some of them (And I'm not use it does , > since old function we mostly left in place for old caller) you probably > want to return the outgoing object that get computed internally. > As I'm looking at this again, I think I was somewhat justified. getchangegroup raw calculates the discovery.outgoing instance and passes it into getlocalchangegroupraw. It then gets passed into getsubsetraw, where .missing is passed into bundler.bundle(). I think it is a layering violation for getchangegroupraw (and its non-raw sibling) to assume getsubsetraw uses outgoing.missing for changegroup data. getsubsetraw is the thing passing a list of changesets into the bundle function, so I think getsubsetraw should return info about what is in the changegroup. As crazy as it sounds, we do have to change 6 functions to hook all this plumbing up in a consistent way.
On 05/25/2015 10:07 PM, Gregory Szorc wrote: > On Mon, May 25, 2015 at 9:59 PM, Pierre-Yves David > <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>> > wrote: > > > > On 05/25/2015 09:53 PM, Gregory Szorc wrote: > > On Mon, May 25, 2015 at 9:32 PM, Pierre-Yves David > <pierre-yves.david@ens-lyon.org > <mailto:pierre-yves.david@ens-lyon.org> > <mailto:pierre-yves.david@ens-lyon.org > <mailto:pierre-yves.david@ens-lyon.org>>> > wrote: > > > > On 05/25/2015 05:42 PM, Gregory Szorc wrote: > > # HG changeset patch > # User Gregory Szorc <gregory.szorc@gmail.com > <mailto:gregory.szorc@gmail.com> > <mailto:gregory.szorc@gmail.com > <mailto:gregory.szorc@gmail.com>>> > # Date 1432589936 25200 > # Mon May 25 14:38:56 2015 -0700 > # Node ID ea28b7239085bbc7d561ab6aa6f86bdc19f77a25 > # Parent 605b1d32c1c011d56233f28923ee5354fce7e426 > changegroup: expose list of changesets from getsubsetraw > > Currently, when generating changegroups, it is possible > for the > caller > to not know exactly what data is inside without parsing > the returned > data stream. This is inefficient and adds complexity. > > Return the list of changesets from getsubsetraw to make > this data > more widely available. > > > I'm confused. This very data is -provided- to the function. > It is in > `outgoing.missing`. Why would we need to return it? > > > Err, I may have gotten too carried away with looking at return > values > and a handful of functions that were named very similarly. I'll > send a v2. > > > What function remains that are not taking outgoing. In the bundle2 > context I expect about all of them to be in the modern form and take > an outgoing object. > > If it still make sense to modify some of them (And I'm not use it > does , since old function we mostly left in place for old caller) > you probably want to return the outgoing object that get computed > internally. > > > As I'm looking at this again, I think I was somewhat justified. > getchangegroup raw calculates the discovery.outgoing instance and passes > it into getlocalchangegroupraw. It then gets passed into getsubsetraw, > where .missing is passed into bundler.bundle(). I think it is a layering > violation for getchangegroupraw (and its non-raw sibling) to assume > getsubsetraw uses outgoing.missing for changegroup data. getsubsetraw is > the thing passing a list of changesets into the bundle function, so I > think getsubsetraw should return info about what is in the changegroup. > As crazy as it sounds, we do have to change 6 functions to hook all this > plumbing up in a consistent way. The "outgoing" object represent a set changesets we want in a bundle and related "base" we can rely on. It feel like appropriate to pass it all along the line.
On Mon, May 25, 2015 at 10:36:54PM -0700, Pierre-Yves David wrote: > > > On 05/25/2015 10:07 PM, Gregory Szorc wrote: > >On Mon, May 25, 2015 at 9:59 PM, Pierre-Yves David > ><pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>> > >wrote: > > > > > > > > On 05/25/2015 09:53 PM, Gregory Szorc wrote: > > > > On Mon, May 25, 2015 at 9:32 PM, Pierre-Yves David > > <pierre-yves.david@ens-lyon.org > > <mailto:pierre-yves.david@ens-lyon.org> > > <mailto:pierre-yves.david@ens-lyon.org > > <mailto:pierre-yves.david@ens-lyon.org>>> > > wrote: > > > > > > > > On 05/25/2015 05:42 PM, Gregory Szorc wrote: > > > > # HG changeset patch > > # User Gregory Szorc <gregory.szorc@gmail.com > > <mailto:gregory.szorc@gmail.com> > > <mailto:gregory.szorc@gmail.com > > <mailto:gregory.szorc@gmail.com>>> > > # Date 1432589936 25200 > > # Mon May 25 14:38:56 2015 -0700 > > # Node ID ea28b7239085bbc7d561ab6aa6f86bdc19f77a25 > > # Parent 605b1d32c1c011d56233f28923ee5354fce7e426 > > changegroup: expose list of changesets from getsubsetraw > > > > Currently, when generating changegroups, it is possible > > for the > > caller > > to not know exactly what data is inside without parsing > > the returned > > data stream. This is inefficient and adds complexity. > > > > Return the list of changesets from getsubsetraw to make > > this data > > more widely available. > > > > > > I'm confused. This very data is -provided- to the function. > > It is in > > `outgoing.missing`. Why would we need to return it? > > > > > > Err, I may have gotten too carried away with looking at return > > values > > and a handful of functions that were named very similarly. I'll > > send a v2. > > > > > > What function remains that are not taking outgoing. In the bundle2 > > context I expect about all of them to be in the modern form and take > > an outgoing object. > > > > If it still make sense to modify some of them (And I'm not use it > > does , since old function we mostly left in place for old caller) > > you probably want to return the outgoing object that get computed > > internally. > > > > > >As I'm looking at this again, I think I was somewhat justified. > >getchangegroup raw calculates the discovery.outgoing instance and passes > >it into getlocalchangegroupraw. It then gets passed into getsubsetraw, > >where .missing is passed into bundler.bundle(). I think it is a layering > >violation for getchangegroupraw (and its non-raw sibling) to assume > >getsubsetraw uses outgoing.missing for changegroup data. getsubsetraw is > >the thing passing a list of changesets into the bundle function, so I > >think getsubsetraw should return info about what is in the changegroup. > >As crazy as it sounds, we do have to change 6 functions to hook all this > >plumbing up in a consistent way. > > The "outgoing" object represent a set changesets we want in a bundle and > related "base" we can rely on. It feel like appropriate to pass it all along > the line. I'm expecting a v2 of this, right? > > -- > Pierre-Yves David > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@selenic.com > http://selenic.com/mailman/listinfo/mercurial-devel
On Tue, May 26, 2015 at 7:32 AM, Augie Fackler <raf@durin42.com> wrote: > On Mon, May 25, 2015 at 10:36:54PM -0700, Pierre-Yves David wrote: > > > > > > On 05/25/2015 10:07 PM, Gregory Szorc wrote: > > >On Mon, May 25, 2015 at 9:59 PM, Pierre-Yves David > > ><pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org > >> > > >wrote: > > > > > > > > > > > > On 05/25/2015 09:53 PM, Gregory Szorc wrote: > > > > > > On Mon, May 25, 2015 at 9:32 PM, Pierre-Yves David > > > <pierre-yves.david@ens-lyon.org > > > <mailto:pierre-yves.david@ens-lyon.org> > > > <mailto:pierre-yves.david@ens-lyon.org > > > <mailto:pierre-yves.david@ens-lyon.org>>> > > > wrote: > > > > > > > > > > > > On 05/25/2015 05:42 PM, Gregory Szorc wrote: > > > > > > # HG changeset patch > > > # User Gregory Szorc <gregory.szorc@gmail.com > > > <mailto:gregory.szorc@gmail.com> > > > <mailto:gregory.szorc@gmail.com > > > <mailto:gregory.szorc@gmail.com>>> > > > # Date 1432589936 25200 > > > # Mon May 25 14:38:56 2015 -0700 > > > # Node ID ea28b7239085bbc7d561ab6aa6f86bdc19f77a25 > > > # Parent 605b1d32c1c011d56233f28923ee5354fce7e426 > > > changegroup: expose list of changesets from > getsubsetraw > > > > > > Currently, when generating changegroups, it is possible > > > for the > > > caller > > > to not know exactly what data is inside without parsing > > > the returned > > > data stream. This is inefficient and adds complexity. > > > > > > Return the list of changesets from getsubsetraw to make > > > this data > > > more widely available. > > > > > > > > > I'm confused. This very data is -provided- to the function. > > > It is in > > > `outgoing.missing`. Why would we need to return it? > > > > > > > > > Err, I may have gotten too carried away with looking at return > > > values > > > and a handful of functions that were named very similarly. I'll > > > send a v2. > > > > > > > > > What function remains that are not taking outgoing. In the bundle2 > > > context I expect about all of them to be in the modern form and take > > > an outgoing object. > > > > > > If it still make sense to modify some of them (And I'm not use it > > > does , since old function we mostly left in place for old caller) > > > you probably want to return the outgoing object that get computed > > > internally. > > > > > > > > >As I'm looking at this again, I think I was somewhat justified. > > >getchangegroup raw calculates the discovery.outgoing instance and passes > > >it into getlocalchangegroupraw. It then gets passed into getsubsetraw, > > >where .missing is passed into bundler.bundle(). I think it is a layering > > >violation for getchangegroupraw (and its non-raw sibling) to assume > > >getsubsetraw uses outgoing.missing for changegroup data. getsubsetraw is > > >the thing passing a list of changesets into the bundle function, so I > > >think getsubsetraw should return info about what is in the changegroup. > > >As crazy as it sounds, we do have to change 6 functions to hook all this > > >plumbing up in a consistent way. > > > > The "outgoing" object represent a set changesets we want in a bundle and > > related "base" we can rely on. It feel like appropriate to pass it all > along > > the line. > > I'm expecting a v2 of this, right? > Yes. Pierre-Yves requested I switch this to return the discovery.outgoing instance and I think that makes sense.
On 05/26/2015 07:32 AM, Augie Fackler wrote: > On Mon, May 25, 2015 at 10:36:54PM -0700, Pierre-Yves David wrote: >> >> >> On 05/25/2015 10:07 PM, Gregory Szorc wrote: >>> On Mon, May 25, 2015 at 9:59 PM, Pierre-Yves David >>> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>> >>> wrote: >>> >>> >>> >>> On 05/25/2015 09:53 PM, Gregory Szorc wrote: >>> >>> On Mon, May 25, 2015 at 9:32 PM, Pierre-Yves David >>> <pierre-yves.david@ens-lyon.org >>> <mailto:pierre-yves.david@ens-lyon.org> >>> <mailto:pierre-yves.david@ens-lyon.org >>> <mailto:pierre-yves.david@ens-lyon.org>>> >>> wrote: >>> >>> >>> >>> On 05/25/2015 05:42 PM, Gregory Szorc wrote: >>> >>> # HG changeset patch >>> # User Gregory Szorc <gregory.szorc@gmail.com >>> <mailto:gregory.szorc@gmail.com> >>> <mailto:gregory.szorc@gmail.com >>> <mailto:gregory.szorc@gmail.com>>> >>> # Date 1432589936 25200 >>> # Mon May 25 14:38:56 2015 -0700 >>> # Node ID ea28b7239085bbc7d561ab6aa6f86bdc19f77a25 >>> # Parent 605b1d32c1c011d56233f28923ee5354fce7e426 >>> changegroup: expose list of changesets from getsubsetraw >>> >>> Currently, when generating changegroups, it is possible >>> for the >>> caller >>> to not know exactly what data is inside without parsing >>> the returned >>> data stream. This is inefficient and adds complexity. >>> >>> Return the list of changesets from getsubsetraw to make >>> this data >>> more widely available. >>> >>> >>> I'm confused. This very data is -provided- to the function. >>> It is in >>> `outgoing.missing`. Why would we need to return it? >>> >>> >>> Err, I may have gotten too carried away with looking at return >>> values >>> and a handful of functions that were named very similarly. I'll >>> send a v2. >>> >>> >>> What function remains that are not taking outgoing. In the bundle2 >>> context I expect about all of them to be in the modern form and take >>> an outgoing object. >>> >>> If it still make sense to modify some of them (And I'm not use it >>> does , since old function we mostly left in place for old caller) >>> you probably want to return the outgoing object that get computed >>> internally. >>> >>> >>> As I'm looking at this again, I think I was somewhat justified. >>> getchangegroup raw calculates the discovery.outgoing instance and passes >>> it into getlocalchangegroupraw. It then gets passed into getsubsetraw, >>> where .missing is passed into bundler.bundle(). I think it is a layering >>> violation for getchangegroupraw (and its non-raw sibling) to assume >>> getsubsetraw uses outgoing.missing for changegroup data. getsubsetraw is >>> the thing passing a list of changesets into the bundle function, so I >>> think getsubsetraw should return info about what is in the changegroup. >>> As crazy as it sounds, we do have to change 6 functions to hook all this >>> plumbing up in a consistent way. >> >> The "outgoing" object represent a set changesets we want in a bundle and >> related "base" we can rely on. It feel like appropriate to pass it all along >> the line. > > I'm expecting a v2 of this, right? Yes. This logic have been written by me under Matt supervision, so I'm expecting Greg to prove to me it is wrong if he wants to over-write it. On the other and this is one of the very first refactoring I did in late 2011 so I won't be surprise if it is wrong :)
Patch
diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py --- a/mercurial/changegroup.py +++ b/mercurial/changegroup.py @@ -549,8 +549,13 @@ def _changegroupinfo(repo, nodes, source for node in nodes: repo.ui.debug("%s\n" % hex(node)) def getsubsetraw(repo, outgoing, bundler, source, fastpath=False): + """Obtain changegroup data. + + Returns a list of binary changeset nodes and an iterator of raw + changegroup data. + """ repo = repo.unfiltered() commonrevs = outgoing.common csets = outgoing.missing heads = outgoing.missingheads @@ -562,12 +567,12 @@ def getsubsetraw(repo, outgoing, bundler repo.filtername is None and heads == sorted(repo.heads())) repo.hook('preoutgoing', throw=True, source=source) _changegroupinfo(repo, csets, source) - return bundler.generate(commonrevs, csets, fastpathlinkrev, source) + return csets, bundler.generate(commonrevs, csets, fastpathlinkrev, source) def getsubset(repo, outgoing, bundler, source, fastpath=False, version='01'): - gengroup = getsubsetraw(repo, outgoing, bundler, source, fastpath) + csets, gengroup = getsubsetraw(repo, outgoing, bundler, source, fastpath) return packermap[version][1](util.chunkbuffer(gengroup), 'UN') def changegroupsubset(repo, roots, heads, source, version='01'): """Compute a changegroup consisting of all the nodes that are @@ -602,9 +607,9 @@ def getlocalchangegroupraw(repo, source, precomputed sets in outgoing. Returns a raw changegroup generator.""" if not outgoing.missing: return None bundler = packermap[version][0](repo, bundlecaps) - return getsubsetraw(repo, outgoing, bundler, source) + return getsubsetraw(repo, outgoing, bundler, source)[1] def getlocalchangegroup(repo, source, outgoing, bundlecaps=None): """Like getbundle, but taking a discovery.outgoing as an argument.