Submitter | Gregory Szorc |
---|---|
Date | Jan. 2, 2017, 11:57 p.m. |
Message ID | <96837453d041fe7289dd.1483401478@ubuntu-vm-main> |
Download | mbox | patch |
Permalink | /patch/18081/ |
State | Changes Requested |
Headers | show |
Comments
On 01/03/2017 12:57 AM, Gregory Szorc wrote: > # HG changeset patch > # User Gregory Szorc <gregory.szorc@gmail.com> > # Date 1483395249 28800 > # Mon Jan 02 14:14:09 2017 -0800 > # Node ID 96837453d041fe7289dd3e6450aee4961517c5b5 > # Parent da38ab97f1f4d312966431a16d1f2acb90b75242 > revlog: use compression engine APIs for decompression > > Now that compression engines declare their header in revlog chunks > and can decompress revlog chunks, we refactor revlog.decompress() > to use them. > > Making full use of the property that revlog compressor objects are > reusable, revlog instances now maintain a dict mapping an engine's > revlog header to a compressor object. This is not only a performance > optimization for engines where compressor object reuse can result in > better performance, but it also serves as a cache of header values > so we don't need to perform redundant lookups against the compression > engine manager. (Yes, I measured and the overhead of a function call > versus a dict lookup was observed.) > > This change effectively replaces manual, inline lookup code with a > dict lookup plus extra function call where compression is actually > used. On the mozilla-unified repository: > > $ hg perfrevlogchunks -c > ! chunk > ! wall 1.955183 comb 1.960000 user 1.930000 sys 0.030000 (best of 6) > ! wall 2.003241 comb 2.000000 user 1.960000 sys 0.040000 (best of 5) > ! chunk batch > ! wall 1.774444 comb 1.770000 user 1.750000 sys 0.020000 (best of 6) > ! wall 1.866939 comb 1.850000 user 1.840000 sys 0.010000 (best of 6) > > $ hg perfrevlogchunks -m > ! chunk > ! wall 2.592354 comb 2.600000 user 2.560000 sys 0.040000 (best of 4) > ! wall 2.733626 comb 2.730000 user 2.670000 sys 0.060000 (best of 4) > ! chunk batch > ! wall 2.469384 comb 2.480000 user 2.420000 sys 0.060000 (best of 4) > ! wall 2.580613 comb 2.590000 user 2.550000 sys 0.040000 (best of 4) > > We can see this change has a negative impact on performance. > > There are 360,210 changelog entries (359,658 or 99.85% zlib compressed > and 552 with the 'u' header). The mean time per decompress() in > non-batch mode increased from 5.428us to 5.561us, a difference of > 0.13us or 2.5%. > > There are 359,342 manifest entries (242,566 or 67.50% zlib compressed, > 116,602 or 32.45% with '\0' header, and 174 or 0.05% that are empty or > have the 'u' header). The mean time per decompress() in non-batch mode > increased from 7.214us to 7.607us, a difference of 0.393us or 5.4%. > > If we reinsert inline code for "if t == 'x': return decompress(data)", > performance remains unchanged or even becomes faster than before > (if we avoid the checks for '\0' and 'u' before 'x'). My conclusion is > that attribute lookups and function calls in Python can matter for > performance. We've known this to be true, which is why tight loops > leverage local variables to avoid attribute lookups. But I'll be honest: > I was surprised to see this come into play for decompression: I would > have expected time spent in the decompressor to dwarf Python time. > > Anyway, `hg perfrevlogchunks` is a microbenchmark. It's a good tool > to measure raw performance, but it may not be indicative of real world > behavior. For example, the performance loss on manifests as a > percentage is troubling. But very few operations need to decompress > every manifest chunk. And when they do, they typically have to perform > other CPU intensive behavior like applying the diff. So the performance > loss on manifests doesn't concern me as much. > > What about changelogs? Again, there are only so many operations that > need to read every changelog revision. Fortunately, we have other > perf commands to measure the impact on these operations: > > $ hg perfrevlogrevision -c -d 1 > ! wall 4.624776 comb 4.620000 user 4.600000 sys 0.020000 (best of 3) > ! wall 4.711967 comb 4.710000 user 4.700000 sys 0.010000 (best of 3) > > $ hg perfrevset 'author(mpm)' > ! wall 9.611942 comb 9.600000 user 9.570000 sys 0.030000 (best of 3) > ! wall 9.915843 comb 9.900000 user 9.860000 sys 0.040000 (best of 3) > > $ hg perfrevset 'author(mpm)' --contexts > ! wall 9.930018 comb 9.930000 user 9.900000 sys 0.030000 (best of 3) > ! wall 10.007671 comb 10.000000 user 9.970000 sys 0.030000 (best of 3) > > Again, we see performance regressions. > > I feel bad about introducing a performance regression. But, I can > argue that's the price to pay for an abstraction. I suppose we could > restore the inline code for zlib (and presumably zstd someday) so we > don't go through the compression engine API. We could have "tier 1" > compression formats where we support fast/inline operations while still > allowing extensions to implement their own, fully capable compression > engines. The regression percentage are noticeable enough (I've seen patch for less) that I think hardcoding the known common case make sense to avoid the regression. That is not super nice but that provide a good way to move forward (in that great direction) without > > diff --git a/mercurial/revlog.py b/mercurial/revlog.py > --- a/mercurial/revlog.py > +++ b/mercurial/revlog.py > @@ -18,7 +18,6 @@ import errno > import hashlib > import os > import struct > -import zlib > > # import stuff from node for others to import from revlog > from .node import ( > @@ -39,7 +38,6 @@ from . import ( > > _pack = struct.pack > _unpack = struct.unpack > -_decompress = zlib.decompress > > # revlog header flags > REVLOGV0 = 0 > @@ -298,6 +296,8 @@ class revlog(object): > self._chunkclear() > # revnum -> (chain-length, sum-delta-length) > self._chaininfocache = {} > + # revlog header -> revlog compressor > + self._decompressors = {} > > @util.propertycache > def _compressor(self): > @@ -1374,17 +1374,43 @@ class revlog(object): > """ > if not data: > return data > + > + # Revlogs are read much more frequently than they are written, so > + # performance of this function is important. > + # > + # We can make a few assumptions about revlogs: > + # > + # 1) the majority of chunks will be compressed (as opposed to inline > + # raw data). > + # 2) decompressing *any* data will likely by at least 10x slower than > + # returning raw inline data. > + # > + # It follows that we want to optimize the "decompress compressed data" > + # case over "return inline data." The code below does this by putting > + # the self._decompressors lookup first and *then* checks for our > + # special cases of the ``\0`` and ``u`` headers. > + # > + # For cases where data is actually compressed, this avoids comparison > + # against the "null" decompressors and allows us to dispatch to the > + # decompressor quicker. According to `hg perfrevlogchunks`, this is > + # ~0.5% faster. > t = data[0] > - if t == '\0': > - return data > - if t == 'x': > + try: > + compressor = self._decompressors[t] > + except KeyError: > + if t == '\0': > + return data > + elif t == 'u': > + return util.buffer(data, 1) > + > try: > - return _decompress(data) > - except zlib.error as e: > - raise RevlogError(_('revlog decompress error: %s') % str(e)) > - if t == 'u': > - return util.buffer(data, 1) > - raise RevlogError(_('unknown compression type %r') % t) > + engine = util.compengines.forrevlogheader(t) > + compressor = engine.revlogcompressor() > + self._decompressors[t] = compressor > + except KeyError: > + raise RevlogError(_('unknown compression type %r') % t) > + > + return compressor.decompress(data) > > def _isgooddelta(self, d, textlen): > """Returns True if the given delta is good. Good means that it is within > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel >
Patch
diff --git a/mercurial/revlog.py b/mercurial/revlog.py --- a/mercurial/revlog.py +++ b/mercurial/revlog.py @@ -18,7 +18,6 @@ import errno import hashlib import os import struct -import zlib # import stuff from node for others to import from revlog from .node import ( @@ -39,7 +38,6 @@ from . import ( _pack = struct.pack _unpack = struct.unpack -_decompress = zlib.decompress # revlog header flags REVLOGV0 = 0 @@ -298,6 +296,8 @@ class revlog(object): self._chunkclear() # revnum -> (chain-length, sum-delta-length) self._chaininfocache = {} + # revlog header -> revlog compressor + self._decompressors = {} @util.propertycache def _compressor(self): @@ -1374,17 +1374,43 @@ class revlog(object): """ if not data: return data + + # Revlogs are read much more frequently than they are written, so + # performance of this function is important. + # + # We can make a few assumptions about revlogs: + # + # 1) the majority of chunks will be compressed (as opposed to inline + # raw data). + # 2) decompressing *any* data will likely by at least 10x slower than + # returning raw inline data. + # + # It follows that we want to optimize the "decompress compressed data" + # case over "return inline data." The code below does this by putting + # the self._decompressors lookup first and *then* checks for our + # special cases of the ``\0`` and ``u`` headers. + # + # For cases where data is actually compressed, this avoids comparison + # against the "null" decompressors and allows us to dispatch to the + # decompressor quicker. According to `hg perfrevlogchunks`, this is + # ~0.5% faster. t = data[0] - if t == '\0': - return data - if t == 'x': + try: + compressor = self._decompressors[t] + except KeyError: + if t == '\0': + return data + elif t == 'u': + return util.buffer(data, 1) + try: - return _decompress(data) - except zlib.error as e: - raise RevlogError(_('revlog decompress error: %s') % str(e)) - if t == 'u': - return util.buffer(data, 1) - raise RevlogError(_('unknown compression type %r') % t) + engine = util.compengines.forrevlogheader(t) + compressor = engine.revlogcompressor() + self._decompressors[t] = compressor + except KeyError: + raise RevlogError(_('unknown compression type %r') % t) + + return compressor.decompress(data) def _isgooddelta(self, d, textlen): """Returns True if the given delta is good. Good means that it is within