Submitter | Gregory Szorc |
---|---|
Date | Jan. 2, 2017, 11:57 p.m. |
Message ID | <cd8f8b1b8dbae0cf2be7.1483401479@ubuntu-vm-main> |
Download | mbox | patch |
Permalink | /patch/18082/ |
State | Changes Requested |
Headers | show |
Comments
On Mon, Jan 2, 2017 at 3:57 PM, Gregory Szorc <gregory.szorc@gmail.com> wrote: > # HG changeset patch > # User Gregory Szorc <gregory.szorc@gmail.com> > # Date 1483401236 28800 > # Mon Jan 02 15:53:56 2017 -0800 > # Node ID cd8f8b1b8dbae0cf2be79a732a936f4105881672 > # Parent 96837453d041fe7289dd3e6450aee4961517c5b5 > [RFC] localrepo: support non-zlib compression engines in revlogs > > The final part of integrating the compression manager APIs into > revlog storage is the plumbing for repositories to advertise they > are using non-zlib storage and for revlogs to instantiate a non-zlib > engine. > > Introduced is a format.compengine config option to control the > compression engine to use. It defaults to zlib, which is backwards > compatible. > > localrepo.newreporequirements() looks at this config option and adds > a "compengine*" requirement on the named compression engine if it isn't > "zlib." Doing it at this layer means that existing repositories will > forever create new data as the engine they were initially created > with: only new repositories get the new compression engine. > > The svfs opener options now define a "compengine" key denoting the > preferred compression engine. This comes from the repo's active > requirements. > > A localrepository instance now updates its supported requirements > to include "compengine*" values for compression engines registered > with the compression engine manager. > > Finally, revlog instances have been taught to instantiate non-zlib > compression options if the opener options instruct it to. > > This patch means that we can use zstd compression in revlogs! > > On an `hg pull` conversion of the mozilla-unified repo with no extra > redelta settings (like aggressivemergedeltas), we can see the impact > of zstd vs zlib in revlogs: > > $ hg perfrevlogchunks -c > ! chunk > ! wall 2.032052 comb 2.040000 user 1.990000 sys 0.050000 (best of 5) > ! wall 1.866360 comb 1.860000 user 1.820000 sys 0.040000 (best of 6) > > ! chunk batch > ! wall 1.877261 comb 1.870000 user 1.860000 sys 0.010000 (best of 6) > ! wall 1.705410 comb 1.710000 user 1.690000 sys 0.020000 (best of 6) > > $ hg perfrevlogchunks -m > ! chunk > ! wall 2.721427 comb 2.720000 user 2.640000 sys 0.080000 (best of 4) > ! wall 2.035076 comb 2.030000 user 1.950000 sys 0.080000 (best of 5) > > ! chunk batch > ! wall 2.614561 comb 2.620000 user 2.580000 sys 0.040000 (best of 4) > ! wall 1.910252 comb 1.910000 user 1.880000 sys 0.030000 (best of 6) > > $ hg perfrevlog -c -d 1 > ! wall 4.812885 comb 4.820000 user 4.800000 sys 0.020000 (best of 3) > ! wall 4.699621 comb 4.710000 user 4.700000 sys 0.010000 (best of 3) > > $ hg perfrevlog -m -d 1000 > ! wall 34.252800 comb 34.250000 user 33.730000 sys 0.520000 (best of 3) > ! wall 24.094999 comb 24.090000 user 23.320000 sys 0.770000 (best of 3) > > Only modest wins for the changelog. But manifest reading is > significantly faster. I presume this is because manifest chunks tend > to be larger and zstd's performance advantage is more apparent the > longer it runs. > > debugcreatestreamclonebundle (size in bytes) > zlib: 1,638,852,492 > zstd: 1,680,601,332 > > I'm not sure why zstd on disk is larger. It could be overhead of its > framing headers. It could also be poor delta parent choices, as I > didn't force redelta on the zstd conversion of the repo. > > $ hg bundle -a -t none-v2 (total CPU time) > zlib: 102.79s > zstd: 97.75s > > So, marginal CPU decrease for reading all chunks in all revlogs > (this is somewhat disappointing). > > $ hg bundle -a -t <engine>-v2 (total CPU time) > zlib: 191.59s > zstd: 115.36s > > This last test effectively measures the difference between zlib->zlib > and zstd->zstd for revlogs to bundle. This is a rough approximation of > what a server does during `hg clone`. > > Also, I haven't exhaustively tested and profiled various zstd APIs > for decompression. The C bindings expose a handful of ways to perform > decompression and it's quite possible there are some performance > optimizations on the table there. > > FWIW, I have a local patch series where I enable support for lz4 > revlogs using the same lz4 C extension used by Facebook's lz4revlogs > extension and surprisingly zstd's decompress speed is faster than > lz4! This shouldn't be true because, well, it is assumed lz4 > decompress is insanely fast. I suspect this has more to do with the > lz4 C extension doing something silly, like reinitializing contexts. > But that's a rabbit hole for another day. > > --- > > Things to hash out making this an RFC quality patch: > > * Is using a wildcard in the requirement name acceptable? Perhaps we > should use e.g. compengine=zstd instead? > > * Does the requirement need to be specific to revlogs? > > * Should the requirement name be explicitly declared by the engine > instead of inferred by the engine name? (A extra layer of > abstraction may help here if a single engine ever supports multiple > requirements, like the zstd one will if dictionary compression is > added.) > > * Do more data due diligence and record in commit message. > > * stream clone bundle requirements testing > > * More tests around revlog format (verification of zstd frame header) > > I've been doing a lot of performance testing and analysis of zstd lately. Things I learned or are notable since I wrote this commit message: * The lz4 performance note in the commit message isn't very accurate. There is a small subset of operations where the zstd python bindings are as fast as lz4. I'll strike the comment from the next version. * zlib has checksums built into the compression format with how it is used in hg today. The patches as written do not have zstd writing checksums. * Enabling checksums in zstd appears to have a negligible impact on performance. * Reusing zstd compression and decompression "contexts" can make a significant difference to performance. Having a reusable "compressor" object that allows "context" reuse should increase performance for zstd. * For the changelog, zstd level=1 versus level=3 makes almost no difference on compression ratio but does speed up compression a bit. Now I'm considering per-revlog settings for the compressors. * zstd compression dictionaries speed up *both* compression and decompression. On changelog chunks, dictionaries improve decompress throughput from ~180 MB/s to ~300 MB/s. That's nothing to sneeze at. * When dictionaries are used, zstd level=1 compresses the changelog considerably faster than level=3. ~160 MB/s vs ~27 MB/s. * I was going to hold off seriously investigating compression dictionaries, but since there are massive perf win potentials, I think it should be done sooner than later. * There is still room to optimize parts of the python zstd bindings and for Mercurial to use the optimal APIs. zstd is still faster than zlib though. Things will only get faster :) > diff --git a/mercurial/help/config.txt b/mercurial/help/config.txt > --- a/mercurial/help/config.txt > +++ b/mercurial/help/config.txt > @@ -687,6 +687,11 @@ Example for ``~/.hgrc``:: > ``format`` > ---------- > > +``compengine`` > + Compression engine to use for storage. > + > + Default: ``zlib`` > + > ``usegeneraldelta`` > Enable or disable the "generaldelta" repository format which improves > repository compression by allowing "revlog" to store delta against > arbitrary > diff --git a/mercurial/help/internals/requirements.txt > b/mercurial/help/internals/requirements.txt > --- a/mercurial/help/internals/requirements.txt > +++ b/mercurial/help/internals/requirements.txt > @@ -106,3 +106,12 @@ one manifest per directory (as opposed t > Support for this requirement was added in Mercurial 3.4 (released > August 2015). The requirement is currently experimental and is > disabled by default. > + > +compengine* > +=========== > + > +Denotes a compression engine required to open files in the repository. > +``*`` is a compression engine name, such as ``zstd``. > + > +Support for this requirement was added in Mercurial 4.1 (released > +February 2017). > diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py > --- a/mercurial/localrepo.py > +++ b/mercurial/localrepo.py > @@ -284,6 +284,12 @@ class localrepository(object): > else: > self.supported = self._basesupported > > + # Add compression engines. > + for name in util.compengines: > + engine = util.compengines[name] > + if engine.revlogheader: > + self.supported.add('compengine%s' % name) > + > if not self.vfs.isdir(): > if create: > self.requirements = newreporequirements(self) > @@ -397,6 +403,10 @@ class localrepository(object): > self.svfs.options['aggressivemergedeltas'] = > aggressivemergedeltas > self.svfs.options['lazydeltabase'] = not > scmutil.gddeltaconfig(self.ui) > > + for r in self.requirements: > + if r.startswith('compengine'): > + self.svfs.options['compengine'] = r[len('compengine'):] > + > def _writerequirements(self): > scmutil.writerequires(self.vfs, self.requirements) > > @@ -1994,6 +2004,17 @@ def newreporequirements(repo): > if ui.configbool('format', 'dotencode', True): > requirements.add('dotencode') > > + compengine = ui.config('format', 'compengine', 'zlib') > + if compengine not in util.compengines: > + raise error.Abort(_('compression engine %s defined by ' > + 'format.compengine not available') % > compengine, > + hint=_('run "hg debuginstall" to list available > ' > + 'compression engines')) > + > + # zlib is the historical default and doesn't need an explicit > requirement. > + if compengine != 'zlib': > + requirements.add('compengine%s' % compengine) > + > if scmutil.gdinitconfig(ui): > requirements.add('generaldelta') > if ui.configbool('experimental', 'treemanifest', False): > diff --git a/mercurial/revlog.py b/mercurial/revlog.py > --- a/mercurial/revlog.py > +++ b/mercurial/revlog.py > @@ -228,6 +228,7 @@ class revlog(object): > # Mapping of revision integer to full node. > self._nodecache = {nullid: nullrev} > self._nodepos = None > + self._compengine = 'zlib' > > v = REVLOG_DEFAULT_VERSION > opts = getattr(opener, 'options', None) > @@ -244,6 +245,8 @@ class revlog(object): > if 'aggressivemergedeltas' in opts: > self._aggressivemergedeltas = > opts['aggressivemergedeltas'] > self._lazydeltabase = bool(opts.get('lazydeltabase', False)) > + if 'compengine' in opts: > + self._compengine = opts['compengine'] > > if self._chunkcachesize <= 0: > raise RevlogError(_('revlog chunk cache size %r is not > greater ' > @@ -301,7 +304,7 @@ class revlog(object): > > @util.propertycache > def _compressor(self): > - return util.compengines['zlib'].revlogcompressor() > + return util.compengines[self._compengine].revlogcompressor() > > def tip(self): > return self.node(len(self.index) - 2) > diff --git a/tests/test-help.t b/tests/test-help.t > --- a/tests/test-help.t > +++ b/tests/test-help.t > @@ -1223,6 +1223,8 @@ Separate sections from subsections > "format" > -------- > > + "compengine" > + > "usegeneraldelta" > > "dotencode" > diff --git a/tests/test-repo-compengines.t b/tests/test-repo-compengines.t > new file mode 100644 > --- /dev/null > +++ b/tests/test-repo-compengines.t > @@ -0,0 +1,78 @@ > +A new repository uses zlib storage, which doesn't require a requirement > + > + $ hg init default > + $ cd default > + $ cat .hg/requires > + dotencode > + fncache > + generaldelta > + revlogv1 > + store > + > + $ touch foo > + $ hg -q commit -A -m 'initial commit with a lot of repeated repeated > repeated text to trigger compression' > + $ hg debugrevlog -c | grep 0x78 > + 0x78 (x) : 1 (100.00%) > + 0x78 (x) : 110 (100.00%) > + > + $ cd .. > + > +Unknown compression engine to format.compengine aborts > + > + $ hg --config format.compengine=unknown init unknown > + abort: compression engine unknown defined by format.compengine not > available > + (run "hg debuginstall" to list available compression engines) > + [255] > + > +A requirement specifying an unknown compression engine results in bail > + > + $ hg init unknownrequirement > + $ cd unknownrequirement > + $ echo compengineunknown >> .hg/requires > + $ hg log > + abort: repository requires features unknown to this Mercurial: > compengineunknown! > + (see https://mercurial-scm.org/wiki/MissingRequirement for more > information) > + [255] > + > + $ cd .. > + > +#if zstd > + > + $ hg --config format.compengine=zstd init zstd > + $ cd zstd > + $ cat .hg/requires > + compenginezstd > + dotencode > + fncache > + generaldelta > + revlogv1 > + store > + > + $ touch foo > + $ hg -q commit -A -m 'initial commit with a lot of repeated repeated > repeated text' > + > + $ hg debugrevlog -c | grep 0x28 > + 0x28 : 1 (100.00%) > + 0x28 : 98 (100.00%) > + > + $ cd .. > + > +Specifying a new format.compengine on an existing repo won't introduce > data > +with that engine or a requirement > + > + $ cd default > + $ touch bar > + $ hg --config format.compengine=zstd -q commit -A -m 'add bar with a > lot of repeated repeated repeated text' > + > + $ cat .hg/requires > + dotencode > + fncache > + generaldelta > + revlogv1 > + store > + > + $ hg debugrevlog -c | grep 0x78 > + 0x78 (x) : 2 (100.00%) > + 0x78 (x) : 199 (100.00%) > + > +#endif >
On Wed, Jan 04, 2017 at 11:18:21PM -0800, Gregory Szorc wrote: > * The lz4 performance note in the commit message isn't very accurate. There > is a small subset of operations where the zstd python bindings are as fast > as lz4. I'll strike the comment from the next version. > > * zlib has checksums built into the compression format with how it is used > in hg today. The patches as written do not have zstd writing checksums. > > * Enabling checksums in zstd appears to have a negligible impact on > performance. > > * Reusing zstd compression and decompression "contexts" can make a > significant difference to performance. Having a reusable "compressor" > object that allows "context" reuse should increase performance for zstd. > > * For the changelog, zstd level=1 versus level=3 makes almost no difference > on compression ratio but does speed up compression a bit. Now I'm > considering per-revlog settings for the compressors. > > * zstd compression dictionaries speed up *both* compression and > decompression. On changelog chunks, dictionaries improve decompress > throughput from ~180 MB/s to ~300 MB/s. That's nothing to sneeze at. > > * When dictionaries are used, zstd level=1 compresses the changelog > considerably faster than level=3. ~160 MB/s vs ~27 MB/s. > > * I was going to hold off seriously investigating compression dictionaries, > but since there are massive perf win potentials, I think it should be done > sooner than later. All these perf information wrt dictionaries make me wonder if there is a corpus of non-english changesets that could be used for some different performance measurements. It's nice that we know things are better for english content, but version control is not exclusive to people writing everything in english. Mike
On Wed, Jan 4, 2017 at 11:27 PM, Mike Hommey <mh@glandium.org> wrote: > On Wed, Jan 04, 2017 at 11:18:21PM -0800, Gregory Szorc wrote: > > * The lz4 performance note in the commit message isn't very accurate. > There > > is a small subset of operations where the zstd python bindings are as > fast > > as lz4. I'll strike the comment from the next version. > > > > * zlib has checksums built into the compression format with how it is > used > > in hg today. The patches as written do not have zstd writing checksums. > > > > * Enabling checksums in zstd appears to have a negligible impact on > > performance. > > > > * Reusing zstd compression and decompression "contexts" can make a > > significant difference to performance. Having a reusable "compressor" > > object that allows "context" reuse should increase performance for zstd. > > > > * For the changelog, zstd level=1 versus level=3 makes almost no > difference > > on compression ratio but does speed up compression a bit. Now I'm > > considering per-revlog settings for the compressors. > > > > * zstd compression dictionaries speed up *both* compression and > > decompression. On changelog chunks, dictionaries improve decompress > > throughput from ~180 MB/s to ~300 MB/s. That's nothing to sneeze at. > > > > * When dictionaries are used, zstd level=1 compresses the changelog > > considerably faster than level=3. ~160 MB/s vs ~27 MB/s. > > > > * I was going to hold off seriously investigating compression > dictionaries, > > but since there are massive perf win potentials, I think it should be > done > > sooner than later. > > All these perf information wrt dictionaries make me wonder if there is a > corpus of non-english changesets that could be used for some different > performance measurements. It's nice that we know things are better for > english content, but version control is not exclusive to people writing > everything in english. Ideally I'd think that dictionaries would be per-repository, calculated from the data within. That would yield ideal compression ratios. As long as we're not transferring compressed frames tied to a specific dictionary across the wire [to a peer without that dictionary], we should be fine. Of course, that introduces the complication of when exactly and how do you compute the dictionary. You need data to seed it, which gets weird for things like clones, since you start from nothing. And dictionaries can change over time if the underlying data changes. So now you are in rewriting revlogs or maintaining multiple dictionaries territory. The zstd frame contains the dictionary ID, so multiple dictionaries is a thing we can support. Other ideas and complications: * Server hands out precomputed dictionary at clone time. Client applies the dictionary when compressing received changegroup data. In this scenario, we can instruct server operators to install a CRON to periodically regenerate the dictionary data. * We seed a dictionary from N popular public VCS repos, bake it into the Mercurial distribution, and defer dynamic dictionary complexities to later. * Computing a dictionary is very CPU intensive. I think it is an order of magnitude slower than compression. It's not something we can just do whenever we feel like it. Also, no matter how many bytes I tell zstd to make the dictionary, it always comes back with something around 100k (at least for mozilla-unified's changelog and manifest). We're not talking about a lot of data for the gain it yields. It's the dictionary management I worry about.
On 01/03/2017 12:57 AM, Gregory Szorc wrote: > # HG changeset patch > # User Gregory Szorc <gregory.szorc@gmail.com> > # Date 1483401236 28800 > # Mon Jan 02 15:53:56 2017 -0800 > # Node ID cd8f8b1b8dbae0cf2be79a732a936f4105881672 > # Parent 96837453d041fe7289dd3e6450aee4961517c5b5 > [RFC] localrepo: support non-zlib compression engines in revlogs Overall the code and the changes looks good to me. I gave my view on most of the question you raised and made a small proposal to rename the config from 'format.compengine' to 'format.compression'. I'm looking forward to see a non-RFC version! > > The final part of integrating the compression manager APIs into > revlog storage is the plumbing for repositories to advertise they > are using non-zlib storage and for revlogs to instantiate a non-zlib > engine. > > Introduced is a format.compengine config option to control the > compression engine to use. It defaults to zlib, which is backwards > compatible. > > localrepo.newreporequirements() looks at this config option and adds > a "compengine*" requirement on the named compression engine if it isn't > "zlib." Doing it at this layer means that existing repositories will > forever create new data as the engine they were initially created > with: only new repositories get the new compression engine. > > The svfs opener options now define a "compengine" key denoting the > preferred compression engine. This comes from the repo's active > requirements. > > A localrepository instance now updates its supported requirements > to include "compengine*" values for compression engines registered > with the compression engine manager. > > Finally, revlog instances have been taught to instantiate non-zlib > compression options if the opener options instruct it to. > > This patch means that we can use zstd compression in revlogs! > > On an `hg pull` conversion of the mozilla-unified repo with no extra > redelta settings (like aggressivemergedeltas), we can see the impact > of zstd vs zlib in revlogs: > > $ hg perfrevlogchunks -c > ! chunk > ! wall 2.032052 comb 2.040000 user 1.990000 sys 0.050000 (best of 5) > ! wall 1.866360 comb 1.860000 user 1.820000 sys 0.040000 (best of 6) > > ! chunk batch > ! wall 1.877261 comb 1.870000 user 1.860000 sys 0.010000 (best of 6) > ! wall 1.705410 comb 1.710000 user 1.690000 sys 0.020000 (best of 6) > > $ hg perfrevlogchunks -m > ! chunk > ! wall 2.721427 comb 2.720000 user 2.640000 sys 0.080000 (best of 4) > ! wall 2.035076 comb 2.030000 user 1.950000 sys 0.080000 (best of 5) > > ! chunk batch > ! wall 2.614561 comb 2.620000 user 2.580000 sys 0.040000 (best of 4) > ! wall 1.910252 comb 1.910000 user 1.880000 sys 0.030000 (best of 6) > > $ hg perfrevlog -c -d 1 > ! wall 4.812885 comb 4.820000 user 4.800000 sys 0.020000 (best of 3) > ! wall 4.699621 comb 4.710000 user 4.700000 sys 0.010000 (best of 3) > > $ hg perfrevlog -m -d 1000 > ! wall 34.252800 comb 34.250000 user 33.730000 sys 0.520000 (best of 3) > ! wall 24.094999 comb 24.090000 user 23.320000 sys 0.770000 (best of 3) > > Only modest wins for the changelog. But manifest reading is > significantly faster. I presume this is because manifest chunks tend > to be larger and zstd's performance advantage is more apparent the > longer it runs. > > debugcreatestreamclonebundle (size in bytes) > zlib: 1,638,852,492 > zstd: 1,680,601,332 > > I'm not sure why zstd on disk is larger. It could be overhead of its > framing headers. It could also be poor delta parent choices, as I > didn't force redelta on the zstd conversion of the repo. > > $ hg bundle -a -t none-v2 (total CPU time) > zlib: 102.79s > zstd: 97.75s > > So, marginal CPU decrease for reading all chunks in all revlogs > (this is somewhat disappointing). > > $ hg bundle -a -t <engine>-v2 (total CPU time) > zlib: 191.59s > zstd: 115.36s > > This last test effectively measures the difference between zlib->zlib > and zstd->zstd for revlogs to bundle. This is a rough approximation of > what a server does during `hg clone`. > > Also, I haven't exhaustively tested and profiled various zstd APIs > for decompression. The C bindings expose a handful of ways to perform > decompression and it's quite possible there are some performance > optimizations on the table there. > > FWIW, I have a local patch series where I enable support for lz4 > revlogs using the same lz4 C extension used by Facebook's lz4revlogs > extension and surprisingly zstd's decompress speed is faster than > lz4! This shouldn't be true because, well, it is assumed lz4 > decompress is insanely fast. I suspect this has more to do with the > lz4 C extension doing something silly, like reinitializing contexts. > But that's a rabbit hole for another day. > > --- > > Things to hash out making this an RFC quality patch: > > * Is using a wildcard in the requirement name acceptable? Perhaps we > should use e.g. compengine=zstd instead? Explicit name of the compengine in the requirement seems clearer to me and better. > * Does the requirement need to be specific to revlogs? I'm not sure what you mean here. Are you thinking about compression applied to non-revlogs file? Do we have any of that right now? What would be the benefit to not use the same compression for all? > > * Should the requirement name be explicitly declared by the engine > instead of inferred by the engine name? (A extra layer of > abstraction may help here if a single engine ever supports multiple > requirements, like the zstd one will if dictionary compression is > added.) Having explicit declaration of the requirement name seems clearer to me (and therefor better). > * Do more data due diligence and record in commit message. As long as we are not moving the default to zstd, I don't think we need the full data set that would be necessary to > * stream clone bundle requirements testing > > * More tests around revlog format (verification of zstd frame header) > > diff --git a/mercurial/help/config.txt b/mercurial/help/config.txt > --- a/mercurial/help/config.txt > +++ b/mercurial/help/config.txt > @@ -687,6 +687,11 @@ Example for ``~/.hgrc``:: > ``format`` > ---------- > > +``compengine`` > + Compression engine to use for storage. > + > + Default: ``zlib`` Any reason to use "compengine" as the name it read a bit obscure to me. What about "compression" ? > + > ``usegeneraldelta`` > Enable or disable the "generaldelta" repository format which improves > repository compression by allowing "revlog" to store delta against arbitrary > diff --git a/mercurial/help/internals/requirements.txt b/mercurial/help/internals/requirements.txt > --- a/mercurial/help/internals/requirements.txt > +++ b/mercurial/help/internals/requirements.txt > @@ -106,3 +106,12 @@ one manifest per directory (as opposed t > Support for this requirement was added in Mercurial 3.4 (released > August 2015). The requirement is currently experimental and is > disabled by default. > + > +compengine* > +=========== > + > +Denotes a compression engine required to open files in the repository. > +``*`` is a compression engine name, such as ``zstd``. > + > +Support for this requirement was added in Mercurial 4.1 (released > +February 2017). Don't forget to also update the wiki page. > diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py > --- a/mercurial/localrepo.py > +++ b/mercurial/localrepo.py > @@ -284,6 +284,12 @@ class localrepository(object): > else: > self.supported = self._basesupported > > + # Add compression engines. > + for name in util.compengines: > + engine = util.compengines[name] > + if engine.revlogheader: > + self.supported.add('compengine%s' % name) > + > if not self.vfs.isdir(): > if create: > self.requirements = newreporequirements(self) > @@ -397,6 +403,10 @@ class localrepository(object): > self.svfs.options['aggressivemergedeltas'] = aggressivemergedeltas > self.svfs.options['lazydeltabase'] = not scmutil.gddeltaconfig(self.ui) > > + for r in self.requirements: > + if r.startswith('compengine'): > + self.svfs.options['compengine'] = r[len('compengine'):] > + > def _writerequirements(self): > scmutil.writerequires(self.vfs, self.requirements) > > @@ -1994,6 +2004,17 @@ def newreporequirements(repo): > if ui.configbool('format', 'dotencode', True): > requirements.add('dotencode') > > + compengine = ui.config('format', 'compengine', 'zlib') > + if compengine not in util.compengines: > + raise error.Abort(_('compression engine %s defined by ' > + 'format.compengine not available') % compengine, > + hint=_('run "hg debuginstall" to list available ' > + 'compression engines')) > + > + # zlib is the historical default and doesn't need an explicit requirement. > + if compengine != 'zlib': > + requirements.add('compengine%s' % compengine) > + > if scmutil.gdinitconfig(ui): > requirements.add('generaldelta') > if ui.configbool('experimental', 'treemanifest', False): > diff --git a/mercurial/revlog.py b/mercurial/revlog.py > --- a/mercurial/revlog.py > +++ b/mercurial/revlog.py > @@ -228,6 +228,7 @@ class revlog(object): > # Mapping of revision integer to full node. > self._nodecache = {nullid: nullrev} > self._nodepos = None > + self._compengine = 'zlib' > > v = REVLOG_DEFAULT_VERSION > opts = getattr(opener, 'options', None) > @@ -244,6 +245,8 @@ class revlog(object): > if 'aggressivemergedeltas' in opts: > self._aggressivemergedeltas = opts['aggressivemergedeltas'] > self._lazydeltabase = bool(opts.get('lazydeltabase', False)) > + if 'compengine' in opts: > + self._compengine = opts['compengine'] > > if self._chunkcachesize <= 0: > raise RevlogError(_('revlog chunk cache size %r is not greater ' > @@ -301,7 +304,7 @@ class revlog(object): > > @util.propertycache > def _compressor(self): > - return util.compengines['zlib'].revlogcompressor() > + return util.compengines[self._compengine].revlogcompressor() > > def tip(self): > return self.node(len(self.index) - 2) > diff --git a/tests/test-help.t b/tests/test-help.t > --- a/tests/test-help.t > +++ b/tests/test-help.t > @@ -1223,6 +1223,8 @@ Separate sections from subsections > "format" > -------- > > + "compengine" > + > "usegeneraldelta" > > "dotencode" > diff --git a/tests/test-repo-compengines.t b/tests/test-repo-compengines.t > new file mode 100644 > --- /dev/null > +++ b/tests/test-repo-compengines.t > @@ -0,0 +1,78 @@ > +A new repository uses zlib storage, which doesn't require a requirement > + > + $ hg init default > + $ cd default > + $ cat .hg/requires > + dotencode > + fncache > + generaldelta > + revlogv1 > + store > + > + $ touch foo > + $ hg -q commit -A -m 'initial commit with a lot of repeated repeated repeated text to trigger compression' > + $ hg debugrevlog -c | grep 0x78 > + 0x78 (x) : 1 (100.00%) > + 0x78 (x) : 110 (100.00%) > + > + $ cd .. > + > +Unknown compression engine to format.compengine aborts > + > + $ hg --config format.compengine=unknown init unknown > + abort: compression engine unknown defined by format.compengine not available > + (run "hg debuginstall" to list available compression engines) > + [255] > + > +A requirement specifying an unknown compression engine results in bail > + > + $ hg init unknownrequirement > + $ cd unknownrequirement > + $ echo compengineunknown >> .hg/requires > + $ hg log > + abort: repository requires features unknown to this Mercurial: compengineunknown! > + (see https://mercurial-scm.org/wiki/MissingRequirement for more information) > + [255] > + > + $ cd .. > + > +#if zstd > + > + $ hg --config format.compengine=zstd init zstd > + $ cd zstd > + $ cat .hg/requires > + compenginezstd > + dotencode > + fncache > + generaldelta > + revlogv1 > + store > + > + $ touch foo > + $ hg -q commit -A -m 'initial commit with a lot of repeated repeated repeated text' > + > + $ hg debugrevlog -c | grep 0x28 > + 0x28 : 1 (100.00%) > + 0x28 : 98 (100.00%) > + > + $ cd .. > + > +Specifying a new format.compengine on an existing repo won't introduce data > +with that engine or a requirement > + > + $ cd default > + $ touch bar > + $ hg --config format.compengine=zstd -q commit -A -m 'add bar with a lot of repeated repeated repeated text' > + > + $ cat .hg/requires > + dotencode > + fncache > + generaldelta > + revlogv1 > + store > + > + $ hg debugrevlog -c | grep 0x78 > + 0x78 (x) : 2 (100.00%) > + 0x78 (x) : 199 (100.00%) > + > +#endif > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel >
Patch
diff --git a/mercurial/help/config.txt b/mercurial/help/config.txt --- a/mercurial/help/config.txt +++ b/mercurial/help/config.txt @@ -687,6 +687,11 @@ Example for ``~/.hgrc``:: ``format`` ---------- +``compengine`` + Compression engine to use for storage. + + Default: ``zlib`` + ``usegeneraldelta`` Enable or disable the "generaldelta" repository format which improves repository compression by allowing "revlog" to store delta against arbitrary diff --git a/mercurial/help/internals/requirements.txt b/mercurial/help/internals/requirements.txt --- a/mercurial/help/internals/requirements.txt +++ b/mercurial/help/internals/requirements.txt @@ -106,3 +106,12 @@ one manifest per directory (as opposed t Support for this requirement was added in Mercurial 3.4 (released August 2015). The requirement is currently experimental and is disabled by default. + +compengine* +=========== + +Denotes a compression engine required to open files in the repository. +``*`` is a compression engine name, such as ``zstd``. + +Support for this requirement was added in Mercurial 4.1 (released +February 2017). diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py --- a/mercurial/localrepo.py +++ b/mercurial/localrepo.py @@ -284,6 +284,12 @@ class localrepository(object): else: self.supported = self._basesupported + # Add compression engines. + for name in util.compengines: + engine = util.compengines[name] + if engine.revlogheader: + self.supported.add('compengine%s' % name) + if not self.vfs.isdir(): if create: self.requirements = newreporequirements(self) @@ -397,6 +403,10 @@ class localrepository(object): self.svfs.options['aggressivemergedeltas'] = aggressivemergedeltas self.svfs.options['lazydeltabase'] = not scmutil.gddeltaconfig(self.ui) + for r in self.requirements: + if r.startswith('compengine'): + self.svfs.options['compengine'] = r[len('compengine'):] + def _writerequirements(self): scmutil.writerequires(self.vfs, self.requirements) @@ -1994,6 +2004,17 @@ def newreporequirements(repo): if ui.configbool('format', 'dotencode', True): requirements.add('dotencode') + compengine = ui.config('format', 'compengine', 'zlib') + if compengine not in util.compengines: + raise error.Abort(_('compression engine %s defined by ' + 'format.compengine not available') % compengine, + hint=_('run "hg debuginstall" to list available ' + 'compression engines')) + + # zlib is the historical default and doesn't need an explicit requirement. + if compengine != 'zlib': + requirements.add('compengine%s' % compengine) + if scmutil.gdinitconfig(ui): requirements.add('generaldelta') if ui.configbool('experimental', 'treemanifest', False): diff --git a/mercurial/revlog.py b/mercurial/revlog.py --- a/mercurial/revlog.py +++ b/mercurial/revlog.py @@ -228,6 +228,7 @@ class revlog(object): # Mapping of revision integer to full node. self._nodecache = {nullid: nullrev} self._nodepos = None + self._compengine = 'zlib' v = REVLOG_DEFAULT_VERSION opts = getattr(opener, 'options', None) @@ -244,6 +245,8 @@ class revlog(object): if 'aggressivemergedeltas' in opts: self._aggressivemergedeltas = opts['aggressivemergedeltas'] self._lazydeltabase = bool(opts.get('lazydeltabase', False)) + if 'compengine' in opts: + self._compengine = opts['compengine'] if self._chunkcachesize <= 0: raise RevlogError(_('revlog chunk cache size %r is not greater ' @@ -301,7 +304,7 @@ class revlog(object): @util.propertycache def _compressor(self): - return util.compengines['zlib'].revlogcompressor() + return util.compengines[self._compengine].revlogcompressor() def tip(self): return self.node(len(self.index) - 2) diff --git a/tests/test-help.t b/tests/test-help.t --- a/tests/test-help.t +++ b/tests/test-help.t @@ -1223,6 +1223,8 @@ Separate sections from subsections "format" -------- + "compengine" + "usegeneraldelta" "dotencode" diff --git a/tests/test-repo-compengines.t b/tests/test-repo-compengines.t new file mode 100644 --- /dev/null +++ b/tests/test-repo-compengines.t @@ -0,0 +1,78 @@ +A new repository uses zlib storage, which doesn't require a requirement + + $ hg init default + $ cd default + $ cat .hg/requires + dotencode + fncache + generaldelta + revlogv1 + store + + $ touch foo + $ hg -q commit -A -m 'initial commit with a lot of repeated repeated repeated text to trigger compression' + $ hg debugrevlog -c | grep 0x78 + 0x78 (x) : 1 (100.00%) + 0x78 (x) : 110 (100.00%) + + $ cd .. + +Unknown compression engine to format.compengine aborts + + $ hg --config format.compengine=unknown init unknown + abort: compression engine unknown defined by format.compengine not available + (run "hg debuginstall" to list available compression engines) + [255] + +A requirement specifying an unknown compression engine results in bail + + $ hg init unknownrequirement + $ cd unknownrequirement + $ echo compengineunknown >> .hg/requires + $ hg log + abort: repository requires features unknown to this Mercurial: compengineunknown! + (see https://mercurial-scm.org/wiki/MissingRequirement for more information) + [255] + + $ cd .. + +#if zstd + + $ hg --config format.compengine=zstd init zstd + $ cd zstd + $ cat .hg/requires + compenginezstd + dotencode + fncache + generaldelta + revlogv1 + store + + $ touch foo + $ hg -q commit -A -m 'initial commit with a lot of repeated repeated repeated text' + + $ hg debugrevlog -c | grep 0x28 + 0x28 : 1 (100.00%) + 0x28 : 98 (100.00%) + + $ cd .. + +Specifying a new format.compengine on an existing repo won't introduce data +with that engine or a requirement + + $ cd default + $ touch bar + $ hg --config format.compengine=zstd -q commit -A -m 'add bar with a lot of repeated repeated repeated text' + + $ cat .hg/requires + dotencode + fncache + generaldelta + revlogv1 + store + + $ hg debugrevlog -c | grep 0x78 + 0x78 (x) : 2 (100.00%) + 0x78 (x) : 199 (100.00%) + +#endif