Submitter | Gregory Szorc |
---|---|
Date | May 14, 2017, 6:33 a.m. |
Message ID | <03c14dca1ff875ed9f46.1494743607@ubuntu-vm-main> |
Download | mbox | patch |
Permalink | /patch/20618/ |
State | Accepted |
Headers | show |
Comments
On Sat, May 13, 2017 at 11:33:27PM -0700, Gregory Szorc wrote: > # HG changeset patch > # User Gregory Szorc <gregory.szorc@gmail.com> > # Date 1494743508 25200 > # Sat May 13 23:31:48 2017 -0700 > # Node ID 03c14dca1ff875ed9f4650166ada89baa02c5843 > # Parent 78496ac300255e9996b3e282086661afc08af37c > repair: compress bundles with gzip instead of bzip2 (BC) Can you give me an idea of how much bigger the bundles are on disk? Most users never clean these up. > repair.strip() produces up to 2 bundles: a persisted "backup" > bundle (containing stripped data) and a temporary bundle > (containing data that will be re-applied post strip). The > temporary bundle is uncompressed and the backup bundle is > compressed with bzip2. Should that temporary bundle be hit with zstd if available? It's scoped to a single execution (more or less), so it ought to be safe and a nearly-free IO win for minimal CPU. > > I imagine bzip2 was chosen as the compression format for backup > bundles because it yields the best compression. That makes sense: > you don't want backup bundles wasting space. But at the same time, > you want operations to complete fast as well. > > bzip2 is likely 3-5x slower than zlib/gip. If you are compressing > a large amount of data, that speed difference can translate to > seconds. It can also translate to lost megabytes. However, > storage space gets cheaper every year and single core CPU > performance isn't scaling as well as it was 10 years ago when > this default was likely chosen. I think we should go with the > faster compression algorithm at the expense of larger bundles. > > I *would* like to change the default to zstd (if available). However, > that's a significant BC break and I don't feel comfortable changing > it just yet. When we have official zstd revlog support and the > repo is using it, I think it makes sense to use zstd for backup > bundles. But for zlib-based revlogs, it's safer to avoid zstd > for the time being. > > This change does make some `hg strip` operations faster if large > amounts of data are being compressed. However, execution time is > often dominated by creating and applying the temporary bundle, so > the benefit likely isn't night and day. > > diff --git a/mercurial/repair.py b/mercurial/repair.py > --- a/mercurial/repair.py > +++ b/mercurial/repair.py > @@ -43,9 +43,9 @@ def _bundle(repo, bases, heads, node, su > if cgversion != '01': > bundletype = "HG20" > if compress: > - comp = 'BZ' > + comp = 'GZ' > elif compress: > - bundletype = "HG10BZ" > + bundletype = "HG10GZ" > else: > bundletype = "HG10UN" > return bundle2.writebundle(repo.ui, cg, name, bundletype, vfs, > diff --git a/tests/test-generaldelta.t b/tests/test-generaldelta.t > --- a/tests/test-generaldelta.t > +++ b/tests/test-generaldelta.t > @@ -154,7 +154,7 @@ Test that strip bundle use bundle2 > 0 files updated, 0 files merged, 5 files removed, 0 files unresolved > saved backup bundle to $TESTTMP/aggressive/.hg/strip-backup/1c5d4dc9a8b8-6c68e60c-backup.hg (glob) > $ hg debugbundle .hg/strip-backup/* > - Stream params: sortdict([('Compression', 'BZ')]) > + Stream params: sortdict([('Compression', 'GZ')]) > changegroup -- "sortdict([('version', '02'), ('nbchanges', '1')])" > 1c5d4dc9a8b8d6e1750966d343e94db665e7a1e9 > > diff --git a/tests/test-strip.t b/tests/test-strip.t > --- a/tests/test-strip.t > +++ b/tests/test-strip.t > @@ -210,7 +210,7 @@ > summary: b > > $ hg debugbundle .hg/strip-backup/* > - Stream params: sortdict([('Compression', 'BZ')]) > + Stream params: sortdict([('Compression', 'GZ')]) > changegroup -- "sortdict([('version', '02'), ('nbchanges', '1')])" > 264128213d290d868c54642d13aeaa3675551a78 > $ hg pull .hg/strip-backup/* > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
On 5/13/17 23:33, Gregory Szorc wrote: > I*would* like to change the default to zstd (if available). However, > that's a significant BC break and I don't feel comfortable changing > it just yet. When we have official zstd revlog support and the > repo is using it, I think it makes sense to use zstd for backup > bundles. But for zlib-based revlogs, it's safer to avoid zstd > for the time being. Even if the backup bundle uses gzip, could the temp bundle use zstd? IIRC the temp bundle can potentially be much larger than the backup bundle because it contains all of the commits with a rev number higher than the lowest one being stripped. - Siddharth
Patch
diff --git a/mercurial/repair.py b/mercurial/repair.py --- a/mercurial/repair.py +++ b/mercurial/repair.py @@ -43,9 +43,9 @@ def _bundle(repo, bases, heads, node, su if cgversion != '01': bundletype = "HG20" if compress: - comp = 'BZ' + comp = 'GZ' elif compress: - bundletype = "HG10BZ" + bundletype = "HG10GZ" else: bundletype = "HG10UN" return bundle2.writebundle(repo.ui, cg, name, bundletype, vfs, diff --git a/tests/test-generaldelta.t b/tests/test-generaldelta.t --- a/tests/test-generaldelta.t +++ b/tests/test-generaldelta.t @@ -154,7 +154,7 @@ Test that strip bundle use bundle2 0 files updated, 0 files merged, 5 files removed, 0 files unresolved saved backup bundle to $TESTTMP/aggressive/.hg/strip-backup/1c5d4dc9a8b8-6c68e60c-backup.hg (glob) $ hg debugbundle .hg/strip-backup/* - Stream params: sortdict([('Compression', 'BZ')]) + Stream params: sortdict([('Compression', 'GZ')]) changegroup -- "sortdict([('version', '02'), ('nbchanges', '1')])" 1c5d4dc9a8b8d6e1750966d343e94db665e7a1e9 diff --git a/tests/test-strip.t b/tests/test-strip.t --- a/tests/test-strip.t +++ b/tests/test-strip.t @@ -210,7 +210,7 @@ summary: b $ hg debugbundle .hg/strip-backup/* - Stream params: sortdict([('Compression', 'BZ')]) + Stream params: sortdict([('Compression', 'GZ')]) changegroup -- "sortdict([('version', '02'), ('nbchanges', '1')])" 264128213d290d868c54642d13aeaa3675551a78 $ hg pull .hg/strip-backup/*