Submitter | Gregory Szorc |
---|---|
Date | Oct. 14, 2015, 10:34 p.m. |
Message ID | <c927f2b34d0a420d0741.1444862043@gps-mbp.local> |
Download | mbox | patch |
Permalink | /patch/11079/ |
State | Deferred |
Headers | show |
Comments
On 2015-10-15 00:34, Gregory Szorc wrote: [..] > +We can unpack packed1 bundles > + > + $ hg init packed > + $ hg -R packed unbundle packed.hg > + 6 files to transfer, 2.55 KB of data > + transferred 2.55 KB in *.* seconds (*/sec) (glob) > + (run 'hg heads' to see heads, 'hg merge' to merge) > + Potentially stupid idea: Why not implement a special form of the clone command, which reads from such a full "bundle" file and creates the repo from that bundle? $ hg clone packed.hg packed I don't know if - alternatively - some new option would be needed. Maybe like this: $ hg clone --unbundle packed.hg packed > +We can't unpack packed1 bundles on non-empty repos > + > + $ hg -R packed unbundle packed.hg > + abort: cannot apply stream clone bundle on non-empty repo > + [255] > + ..which then could not happen any more
On Wed, Oct 14, 2015 at 4:31 PM, Adrian Buehlmann <adrian@cadifra.com> wrote: > On 2015-10-15 00:34, Gregory Szorc wrote: > [..] > > +We can unpack packed1 bundles > > + > > + $ hg init packed > > + $ hg -R packed unbundle packed.hg > > + 6 files to transfer, 2.55 KB of data > > + transferred 2.55 KB in *.* seconds (*/sec) (glob) > > + (run 'hg heads' to see heads, 'hg merge' to merge) > > + > > Potentially stupid idea: > > Why not implement a special form of the clone command, which reads from > such a full "bundle" file and creates the repo from that bundle? > > $ hg clone packed.hg packed > You can already do this! Although, it doesn't work with this new bundle type because you need to teach bundlerepo.py about different bundle types (it's also broken for bundle2 currently). > > I don't know if - alternatively - some new option would be needed. Maybe > like this: > > $ hg clone --unbundle packed.hg packed > > > +We can't unpack packed1 bundles on non-empty repos > > + > > + $ hg -R packed unbundle packed.hg > > + abort: cannot apply stream clone bundle on non-empty repo > > + [255] > > + > > ..which then could not happen any more > This touches on the subject of whether streaming clone / packed bundles are or aren't bundles. I could argue both perspectives. But if they aren't bundles, then we need to invent new commands and figure out a way to shoehorn them into the new clone bundles feature. I think I'm fine with either way: I just care most about getting streaming clone support into the clone bundles feature otherwise the feature is useless for Mozilla's automation needs.
On 2015-10-15 01:40, Gregory Szorc wrote: > On Wed, Oct 14, 2015 at 4:31 PM, Adrian Buehlmann <adrian@cadifra.com > <mailto:adrian@cadifra.com>> wrote: > > On 2015-10-15 00:34, Gregory Szorc wrote: > [..] > > +We can unpack packed1 bundles > > + > > + $ hg init packed > > + $ hg -R packed unbundle packed.hg > > + 6 files to transfer, 2.55 KB of data > > + transferred 2.55 KB in *.* seconds (*/sec) (glob) > > + (run 'hg heads' to see heads, 'hg merge' to merge) > > + > > Potentially stupid idea: > > Why not implement a special form of the clone command, which reads from > such a full "bundle" file and creates the repo from that bundle? > > $ hg clone packed.hg packed > > > You can already do this! Although, it doesn't work with this new bundle > type because you need to teach bundlerepo.py about different bundle > types (it's also broken for bundle2 currently). This sounds a bit like a contradiction to me... ("You can already do this" vs "Although, it doesn't work...") If it doesn't work, then it should probably be fixed to work with these kind of bundles? These kind of bundles can't be applied to a non-empty repo, so it seems to me that trying to make the unbundle command work with these is a bit of a misfit. The error message in the case below just looked a bit silly to me.. The next question then was: Why add support for these kind of "bundles" to the unbundle command, if they can only be used on empty repos? If you start doing that (adding support for these kind of bundles to the unbundle command), you will have to support it forever... > I don't know if - alternatively - some new option would be needed. Maybe > like this: > > $ hg clone --unbundle packed.hg packed > > > +We can't unpack packed1 bundles on non-empty repos > > + > > + $ hg -R packed unbundle packed.hg > > + abort: cannot apply stream clone bundle on non-empty repo > > + [255] > > + > > ..which then could not happen any more > > > This touches on the subject of whether streaming clone / packed bundles > are or aren't bundles. I could argue both perspectives. But if they > aren't bundles, then we need to invent new commands and figure out a way > to shoehorn them into the new clone bundles feature. I think I'm fine > with either way: I just care most about getting streaming clone support > into the clone bundles feature otherwise the feature is useless for > Mozilla's automation needs. As I understand the matter, these new "full (streamclone) bundles" aren't really bundles to me. They can never be used like a classic bundle. What can you do with such a full bundle? The only command that makes sense is clone: Transform it into a new repo. But perhaps I'm missing your point here, as I'm mostly sitting on the fence. I got (unwillingly) involved into the business of streamcloning because streamcloning ('hg clone --uncompressed') was the preexisting feature which made the fncache file (.hg/store/fncache) necessary. Which in turn was used as the name of the new repo format that I helped introduce quite a while a ago. The only purpose of that repo format was to limit the length of path names under .hg/store, which was biting us on Windows. I then got bitten again when Bryan and myself tortured each other with translating "my" horrible store._hybridencode function into C code, which he found to be needed, because it turned out to be too slow for his important use case. Yikes. Now I see you inventing a new kind of bundles out of this streamclone stuff. Which is quite interesting.
On Wed, Oct 14, 2015 at 03:34:03PM -0700, Gregory Szorc wrote: > # HG changeset patch > # User Gregory Szorc <gregory.szorc@gmail.com> > # Date 1444861540 25200 > # Wed Oct 14 15:25:40 2015 -0700 > # Node ID c927f2b34d0a420d074152464e854901c746527a > # Parent e577b42c94d917eb3fbbe866bf66e455a251bb1b > commands.unbundle: support consuming streaming clone bundles Series looks mostly reasonable to me. I do wonder if it'd be equally good to have a streamgroup part in a bundle2 container, but it's probably not worth the overhead. > > Now that we can produce streaming clone bundles `hg bundle`, it makes > sense to support consuming them with `hg unbundle` too. > > We don't need to perform any sanity checking in the command because the > application code in clonebundles.py does this. So the integration here > is pretty simple. > > diff --git a/mercurial/commands.py b/mercurial/commands.py > --- a/mercurial/commands.py > +++ b/mercurial/commands.py > @@ -6530,8 +6530,11 @@ def unbundle(ui, repo, fname1, *fnames, > tr.release() > changes = [r.get('return', 0) > for r in op.records['changegroup']] > modheads = changegroup.combineresults(changes) > + elif isinstance(gen, streamclone.unbundler): > + gen.apply(repo) > + modheads = len(repo.heads()) > else: > modheads = changegroup.addchangegroup(repo, gen, 'unbundle', > 'bundle:' + fname) > finally: > diff --git a/tests/test-bundle.t b/tests/test-bundle.t > --- a/tests/test-bundle.t > +++ b/tests/test-bundle.t > @@ -277,8 +277,22 @@ packed1 is produced properly > 0010: 00 00 00 00 0a 30 00 09 72 65 76 6c 6f 67 76 31 |.....0..revlogv1| > 0020: 00 64 61 74 61 2f 61 64 69 66 66 65 72 65 6e 74 |.data/adifferent| > 0030: 66 69 6c 65 2e 69 00 31 33 39 0a 00 01 00 01 00 |file.i.139......| > > +We can unpack packed1 bundles > + > + $ hg init packed > + $ hg -R packed unbundle packed.hg > + 6 files to transfer, 2.55 KB of data > + transferred 2.55 KB in *.* seconds (*/sec) (glob) > + (run 'hg heads' to see heads, 'hg merge' to merge) > + > +We can't unpack packed1 bundles on non-empty repos > + > + $ hg -R packed unbundle packed.hg > + abort: cannot apply stream clone bundle on non-empty repo > + [255] > + > Create partial clones > > $ rm -r empty > $ hg init empty > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@selenic.com > https://selenic.com/mailman/listinfo/mercurial-devel
On Thu, Oct 15, 2015 at 10:01 AM, Augie Fackler <raf@durin42.com> wrote: > On Wed, Oct 14, 2015 at 03:34:03PM -0700, Gregory Szorc wrote: > > # HG changeset patch > > # User Gregory Szorc <gregory.szorc@gmail.com> > > # Date 1444861540 25200 > > # Wed Oct 14 15:25:40 2015 -0700 > > # Node ID c927f2b34d0a420d074152464e854901c746527a > > # Parent e577b42c94d917eb3fbbe866bf66e455a251bb1b > > commands.unbundle: support consuming streaming clone bundles > > Series looks mostly reasonable to me. I do wonder if it'd be equally > good to have a streamgroup part in a bundle2 container, but it's > probably not worth the overhead. > This was originally my intent: add a part to bundle2 and be done with it. However, bundle2 seems to add a 10-20% processing overhead when you are as low-level as stream clones. Stream clones are all about minimizing end-to-end clone times and this overhead wasn't acceptable to me. I /think/ there is room to optimize bundle2 reading. But it's a bit of work. Maybe for 3.7. > > > > > Now that we can produce streaming clone bundles `hg bundle`, it makes > > sense to support consuming them with `hg unbundle` too. > > > > We don't need to perform any sanity checking in the command because the > > application code in clonebundles.py does this. So the integration here > > is pretty simple. > > > > diff --git a/mercurial/commands.py b/mercurial/commands.py > > --- a/mercurial/commands.py > > +++ b/mercurial/commands.py > > @@ -6530,8 +6530,11 @@ def unbundle(ui, repo, fname1, *fnames, > > tr.release() > > changes = [r.get('return', 0) > > for r in op.records['changegroup']] > > modheads = changegroup.combineresults(changes) > > + elif isinstance(gen, streamclone.unbundler): > > + gen.apply(repo) > > + modheads = len(repo.heads()) > > else: > > modheads = changegroup.addchangegroup(repo, gen, > 'unbundle', > > 'bundle:' + fname) > > finally: > > diff --git a/tests/test-bundle.t b/tests/test-bundle.t > > --- a/tests/test-bundle.t > > +++ b/tests/test-bundle.t > > @@ -277,8 +277,22 @@ packed1 is produced properly > > 0010: 00 00 00 00 0a 30 00 09 72 65 76 6c 6f 67 76 31 > |.....0..revlogv1| > > 0020: 00 64 61 74 61 2f 61 64 69 66 66 65 72 65 6e 74 > |.data/adifferent| > > 0030: 66 69 6c 65 2e 69 00 31 33 39 0a 00 01 00 01 00 > |file.i.139......| > > > > +We can unpack packed1 bundles > > + > > + $ hg init packed > > + $ hg -R packed unbundle packed.hg > > + 6 files to transfer, 2.55 KB of data > > + transferred 2.55 KB in *.* seconds (*/sec) (glob) > > + (run 'hg heads' to see heads, 'hg merge' to merge) > > + > > +We can't unpack packed1 bundles on non-empty repos > > + > > + $ hg -R packed unbundle packed.hg > > + abort: cannot apply stream clone bundle on non-empty repo > > + [255] > > + > > Create partial clones > > > > $ rm -r empty > > $ hg init empty > > _______________________________________________ > > Mercurial-devel mailing list > > Mercurial-devel@selenic.com > > https://selenic.com/mailman/listinfo/mercurial-devel >
On Thu, Oct 15, 2015 at 12:42 AM, Adrian Buehlmann <adrian@cadifra.com> wrote: > On 2015-10-15 01:40, Gregory Szorc wrote: > > On Wed, Oct 14, 2015 at 4:31 PM, Adrian Buehlmann <adrian@cadifra.com > > <mailto:adrian@cadifra.com>> wrote: > > > > On 2015-10-15 00:34, Gregory Szorc wrote: > > [..] > > > +We can unpack packed1 bundles > > > + > > > + $ hg init packed > > > + $ hg -R packed unbundle packed.hg > > > + 6 files to transfer, 2.55 KB of data > > > + transferred 2.55 KB in *.* seconds (*/sec) (glob) > > > + (run 'hg heads' to see heads, 'hg merge' to merge) > > > + > > > > Potentially stupid idea: > > > > Why not implement a special form of the clone command, which reads > from > > such a full "bundle" file and creates the repo from that bundle? > > > > $ hg clone packed.hg packed > > > > > > You can already do this! Although, it doesn't work with this new bundle > > type because you need to teach bundlerepo.py about different bundle > > types (it's also broken for bundle2 currently). > > This sounds a bit like a contradiction to me... ("You can already do > this" vs "Although, it doesn't work...") > Sorry. You can `hg clone` from a bundle file. However, that only works for HG10 bundle types (changegroup 1 data) today: the command will crash if you attempt to clone from a bundle2 file. It will also crash with stream clones. bundlerepo.py needs some love to make it work with different bundle types. > > If it doesn't work, then it should probably be fixed to work with these > kind of bundles? > > These kind of bundles can't be applied to a non-empty repo, so it seems > to me that trying to make the unbundle command work with these is a bit > of a misfit. > I'm sympathetic to this argument. > > The error message in the case below just looked a bit silly to me.. The > next question then was: Why add support for these kind of "bundles" to > the unbundle command, if they can only be used on empty repos? If you > start doing that (adding support for these kind of bundles to the > unbundle command), you will have to support it forever... > > > I don't know if - alternatively - some new option would be needed. > Maybe > > like this: > > > > $ hg clone --unbundle packed.hg packed > > > > > +We can't unpack packed1 bundles on non-empty repos > > > + > > > + $ hg -R packed unbundle packed.hg > > > + abort: cannot apply stream clone bundle on non-empty repo > > > + [255] > > > + > > > > ..which then could not happen any more > > > > > > This touches on the subject of whether streaming clone / packed bundles > > are or aren't bundles. I could argue both perspectives. But if they > > aren't bundles, then we need to invent new commands and figure out a way > > to shoehorn them into the new clone bundles feature. I think I'm fine > > with either way: I just care most about getting streaming clone support > > into the clone bundles feature otherwise the feature is useless for > > Mozilla's automation needs. > > As I understand the matter, these new "full (streamclone) bundles" > aren't really bundles to me. They can never be used like a classic > bundle. What can you do with such a full bundle? The only command that > makes sense is clone: Transform it into a new repo. > > But perhaps I'm missing your point here, as I'm mostly sitting on the > fence. I got (unwillingly) involved into the business of streamcloning > because streamcloning ('hg clone --uncompressed') was the preexisting > feature which made the fncache file (.hg/store/fncache) necessary. Which > in turn was used as the name of the new repo format that I helped > introduce quite a while a ago. The only purpose of that repo format was > to limit the length of path names under .hg/store, which was biting us > on Windows. I then got bitten again when Bryan and myself tortured each > other with translating "my" horrible store._hybridencode function into C > code, which he found to be needed, because it turned out to be too slow > for his important use case. Yikes. > > Now I see you inventing a new kind of bundles out of this streamclone > stuff. Which is quite interesting. > Stream cloning to me is all about raw clone speed: the client can effectively stream bits from the network to the filesystem. No relatively expensive changegroup reading, delta application, zlib decompression, etc: just the equivalent of `curl | tar`. Mozilla wrote an extension that can download a pre-built stream clone file and apply it. We're hosting these files in S3 and in EC2 we can clone at 50+ MB/s. The result is clone times that are 3-5x faster than traditional, changegroup-based clones. My end goal is to introduce a "packed revlog" data format that is essentially an index + raw revlog data. This is *very* similar to streaming clones bundles of today except there is a file index at the beginning so you can quickly find revlog content for a specific file. I would like to implement a "packed revlog repo" class that can read repositories directly from this single file. When this is implemented, a streaming clone would write a single, large "packed revlog" to disk (as opposed to 1 file per revlog). You would be able to clone as fast as your network and filesystem allowed. (Writing 100,000 files is slow, no matter how small.) You would be able to clone a 1 GB repo in 10s over a 1 Gbps network. https://hg.mozilla.org/users/gszorc_mozilla.com/hg/rev/d6aeb3b8b512 contains the "stream clone v2" part of that work. My immediate goal is to support stream clones over the new clone bundles feature so core has feature parity with what Mozilla implemented. Honestly, I'd be fine with not supporting unbundling of stream clone "bundles" through any other means. As long as a server admin can produce them and clone bundles can consume them, I'm happy. Considering the limitations of these "bundles" with `hg unbundle`, perhaps it makes sense to go that route...
Patch
diff --git a/mercurial/commands.py b/mercurial/commands.py --- a/mercurial/commands.py +++ b/mercurial/commands.py @@ -6530,8 +6530,11 @@ def unbundle(ui, repo, fname1, *fnames, tr.release() changes = [r.get('return', 0) for r in op.records['changegroup']] modheads = changegroup.combineresults(changes) + elif isinstance(gen, streamclone.unbundler): + gen.apply(repo) + modheads = len(repo.heads()) else: modheads = changegroup.addchangegroup(repo, gen, 'unbundle', 'bundle:' + fname) finally: diff --git a/tests/test-bundle.t b/tests/test-bundle.t --- a/tests/test-bundle.t +++ b/tests/test-bundle.t @@ -277,8 +277,22 @@ packed1 is produced properly 0010: 00 00 00 00 0a 30 00 09 72 65 76 6c 6f 67 76 31 |.....0..revlogv1| 0020: 00 64 61 74 61 2f 61 64 69 66 66 65 72 65 6e 74 |.data/adifferent| 0030: 66 69 6c 65 2e 69 00 31 33 39 0a 00 01 00 01 00 |file.i.139......| +We can unpack packed1 bundles + + $ hg init packed + $ hg -R packed unbundle packed.hg + 6 files to transfer, 2.55 KB of data + transferred 2.55 KB in *.* seconds (*/sec) (glob) + (run 'hg heads' to see heads, 'hg merge' to merge) + +We can't unpack packed1 bundles on non-empty repos + + $ hg -R packed unbundle packed.hg + abort: cannot apply stream clone bundle on non-empty repo + [255] + Create partial clones $ rm -r empty $ hg init empty