Patchwork Testing very long delta chains

login
register
mail settings
Submitter Matt Mackall
Date Dec. 23, 2015, 5:41 a.m.
Message ID <1450849307.7342.324.camel@selenic.com>
Download mbox | patch
Permalink /patch/12264/
State Not Applicable
Headers show

Comments

Matt Mackall - Dec. 23, 2015, 5:41 a.m.
On Tue, 2015-12-22 at 17:27 -0800, Gregory Szorc wrote:
> https://www.mercurial-scm.org/wiki/BigRepositories has been updated with a
> link to
> https://hg.mozilla.org/users/gszorc_mozilla.com/mozilla-central-aggressivemerg
> edeltas,
> which is a generaldelta clone of mozilla-central with
> format.aggressivemergedeltas enabled.
> 
> The last manifest delta chain in this repo is over 45,000 entries deep and
> it makes for a good benchmark for testing revlog reading performance.
> 
> Remember: `hg clone --uncompressed` to preserve the delta chains from the
> server or your client will recompute them as part of applying the
> changegroup.

Without my threaded zlib hack:

$ hg perfmanifest 277045
! wall 0.749929 comb 0.740000 user 0.730000 sys 0.010000 (best of 13)

(25% CPU usage on a CPU with 4 threads)

With my threaded zlib hack (threads = 4):

$ hg perfmanifest 277045
! wall 0.480251 comb 1.090000 user 0.990000 sys 0.100000
(best of 20)

(50% CPU usage on a CPU with 4 threads)

Things we can do better:

- add a C decompress helper
- that works on lists of buffers
- that calls zlib directly
- that uses threads
- that uses larger buffers
- that uses a faster zlib

(For this last, the cloudflare fork of zlib has a faster CRC function that seems to be worth about 20%)


# HG changeset patch
# User Matt Mackall <mpm@selenic.com>
# Date 1450727921 21600
#      Mon Dec 21 13:58:41 2015 -0600
# Node ID b56bc1676b5d4a14167be2498921b57f06ddcd69
# Parent  3dea4eae4eebac11741f0c1dc5dcd9c88d8f4554
revlog: thread decompress
Gregory Szorc - Dec. 23, 2015, 7:30 a.m.
On Tue, Dec 22, 2015 at 9:41 PM, Matt Mackall <mpm@selenic.com> wrote:

> On Tue, 2015-12-22 at 17:27 -0800, Gregory Szorc wrote:
> > https://www.mercurial-scm.org/wiki/BigRepositories has been updated
> with a
> > link to
> >
> https://hg.mozilla.org/users/gszorc_mozilla.com/mozilla-central-aggressivemerg
> > edeltas,
> > which is a generaldelta clone of mozilla-central with
> > format.aggressivemergedeltas enabled.
> >
> > The last manifest delta chain in this repo is over 45,000 entries deep
> and
> > it makes for a good benchmark for testing revlog reading performance.
> >
> > Remember: `hg clone --uncompressed` to preserve the delta chains from the
> > server or your client will recompute them as part of applying the
> > changegroup.
>
> Without my threaded zlib hack:
>
> $ hg perfmanifest 277045
> ! wall 0.749929 comb 0.740000 user 0.730000 sys 0.010000 (best of 13)
>
> (25% CPU usage on a CPU with 4 threads)
>
> With my threaded zlib hack (threads = 4):
>
> $ hg perfmanifest 277045
> ! wall 0.480251 comb 1.090000 user 0.990000 sys 0.100000
> (best of 20)
>
> (50% CPU usage on a CPU with 4 threads)
>

Assuming 100% CPU usage, that's still ~240ms, which feels a bit steep. I
think 100ms should be the upper limit. Preferably 50ms if we want people to
barely notice the pause. We /might/ get another 2x speedup (~120ms) with
all the changes below. I'm skeptical of getting 4x without limiting delta
chain length or without possibly more things (like I/O and bdiff patching)
moved out of Python (even if Python is just proxying data between C
function calls). (For the record, I share your hesitation to not establish
a delta chain length cap.)


>
> Things we can do better:
>
> - add a C decompress helper
> - that works on lists of buffers
> - that calls zlib directly
> - that uses threads
> - that uses larger buffers
> - that uses a faster zlib
>
> (For this last, the cloudflare fork of zlib has a faster CRC function that
> seems to be worth about 20%)
>

From C, this will not be fun because Windows.

Half serious question: what are your thoughts on writing this in Rust? We
can write a Rust library that provides a C ABI which Python can interface
with (either via a Python C extension or via ctypes/cffi). Rust should be
supported on all the platforms we care about. For building/distribution, we
can provide wheels for Windows and OS X so `pip install` "just works" [and
doesn't require a working compiler]. That leaves Linux and other Unixen.
Most distributions these days offer a Rust package or are in the process of
offering one. For the laggards, I assume we'll still have the pure Python
fallback.


>
>
> # HG changeset patch
> # User Matt Mackall <mpm@selenic.com>
> # Date 1450727921 21600
> #      Mon Dec 21 13:58:41 2015 -0600
> # Node ID b56bc1676b5d4a14167be2498921b57f06ddcd69
> # Parent  3dea4eae4eebac11741f0c1dc5dcd9c88d8f4554
> revlog: thread decompress
>
> diff -r 3dea4eae4eeb -r b56bc1676b5d mercurial/revlog.py
> --- a/mercurial/revlog.py       Mon Dec 21 14:52:18 2015 -0600
> +++ b/mercurial/revlog.py       Mon Dec 21 13:58:41 2015 -0600
> @@ -17,6 +17,8 @@
>  import errno
>  import os
>  import struct
> +import threading
> +import Queue
>  import zlib
>
>  # import stuff from node for others to import from revlog
> @@ -1132,14 +1134,38 @@
>              # 2G on Windows
>              return [self._chunk(rev, df=df) for rev in revs]
>
> -        for rev in revs:
> +        slots = [None] * len(revs)
> +
> +        work = []
> +        done = Queue.Queue()
> +
> +        for slot, rev in enumerate(revs):
>              chunkstart = start(rev)
>              if inline:
>                  chunkstart += (rev + 1) * iosize
>              chunklength = length(rev)
> -            ladd(decompress(buffer(data, chunkstart - offset,
> chunklength)))
> +            buf = buffer(data, chunkstart - offset, chunklength)
> +            if buf and buf[0] == 'x':
> +                work.append((slot, buf))
> +            else:
> +                slots[slot] = decompress(buf)
>
> -        return l
> +        def worker():
> +            try:
> +                while True:
> +                    slot, buf = work.pop()
> +                    slots[slot] = _decompress(buf)
> +            except:
> +                done.put(1)
> +
> +        tcount = 4
> +        for w in xrange(tcount - 1):
> +            threading.Thread(target=worker).start()
> +        worker()
> +        for w in xrange(tcount):
> +            done.get()
> +
> +        return slots
>
>      def _chunkclear(self):
>          """Clear the raw chunk cache."""
>
> --
> Mathematics is the supreme nostalgia of our time.
>
>
Yuya Nishihara - Dec. 23, 2015, 3:22 p.m.
On Tue, 22 Dec 2015 23:30:20 -0800, Gregory Szorc wrote:
> On Tue, Dec 22, 2015 at 9:41 PM, Matt Mackall <mpm@selenic.com> wrote:
> > Things we can do better:
> >
> > - add a C decompress helper
> > - that works on lists of buffers
> > - that calls zlib directly
> > - that uses threads
> > - that uses larger buffers
> > - that uses a faster zlib
> >
> > (For this last, the cloudflare fork of zlib has a faster CRC function that
> > seems to be worth about 20%)
> >
> 
> From C, this will not be fun because Windows.
> 
> Half serious question: what are your thoughts on writing this in Rust? We
> can write a Rust library that provides a C ABI which Python can interface
> with (either via a Python C extension or via ctypes/cffi). Rust should be
> supported on all the platforms we care about. For building/distribution, we
> can provide wheels for Windows and OS X so `pip install` "just works" [and
> doesn't require a working compiler]. That leaves Linux and other Unixen.
> Most distributions these days offer a Rust package or are in the process of
> offering one. For the laggards, I assume we'll still have the pure Python
> fallback.

I like Rust, but I don't think their libraries and packaging stuff are mature
enough. For example, we'll need a non-std crate for zlib and its dependency
would be managed by cargo with an explicit version. And, it would be headache
to translate this sort of dependencies to system packages (e.g. .deb or .rpm).

Also, 32bit MSVC ABI isn't stable yet?
Gregory Szorc - Dec. 23, 2015, 5:03 p.m.
> On Dec 23, 2015, at 07:22, Yuya Nishihara <yuya@tcha.org> wrote:
> 
>> On Tue, 22 Dec 2015 23:30:20 -0800, Gregory Szorc wrote:
>>> On Tue, Dec 22, 2015 at 9:41 PM, Matt Mackall <mpm@selenic.com> wrote:
>>> Things we can do better:
>>> 
>>> - add a C decompress helper
>>> - that works on lists of buffers
>>> - that calls zlib directly
>>> - that uses threads
>>> - that uses larger buffers
>>> - that uses a faster zlib
>>> 
>>> (For this last, the cloudflare fork of zlib has a faster CRC function that
>>> seems to be worth about 20%)
>> 
>> From C, this will not be fun because Windows.
>> 
>> Half serious question: what are your thoughts on writing this in Rust? We
>> can write a Rust library that provides a C ABI which Python can interface
>> with (either via a Python C extension or via ctypes/cffi). Rust should be
>> supported on all the platforms we care about. For building/distribution, we
>> can provide wheels for Windows and OS X so `pip install` "just works" [and
>> doesn't require a working compiler]. That leaves Linux and other Unixen.
>> Most distributions these days offer a Rust package or are in the process of
>> offering one. For the laggards, I assume we'll still have the pure Python
>> fallback.
> 
> I like Rust, but I don't think their libraries and packaging stuff are mature
> enough. For example, we'll need a non-std crate for zlib and its dependency
> would be managed by cargo with an explicit version. And, it would be headache
> to translate this sort of dependencies to system packages (e.g. .deb or .rpm).

At the point we're using a faster zlib implementation, we'll likely have a zlib implementation vendored and statically linked. So, I think this concern translates to how how hard it is to support linking against the system zlib, if we want to support that at all.

FWIW, the way I see PyPy and Python 3 support playing out is we rewrite the existing C extensions code and produce a shared library that we call into from Python using ctypes/cffi. At that point, the underlying code could be C, C++, Rust, or anything else (or a mix thereof) capable of producing something that conforms to the C ABI.

> 
> Also, 32bit MSVC ABI isn't stable yet?

They are conservative with the "stable" label. Firefox will be shipping a Rust component in 2016. If it's good enough for Firefox, it should be good enough for Mercurial.
Sean Farley - Dec. 23, 2015, 6:15 p.m.
Gregory Szorc <gregory.szorc@gmail.com> writes:

>> On Dec 23, 2015, at 07:22, Yuya Nishihara <yuya@tcha.org> wrote:
>> 
>>> On Tue, 22 Dec 2015 23:30:20 -0800, Gregory Szorc wrote:
>>>> On Tue, Dec 22, 2015 at 9:41 PM, Matt Mackall <mpm@selenic.com> wrote:
>>>> Things we can do better:
>>>> 
>>>> - add a C decompress helper
>>>> - that works on lists of buffers
>>>> - that calls zlib directly
>>>> - that uses threads
>>>> - that uses larger buffers
>>>> - that uses a faster zlib
>>>> 
>>>> (For this last, the cloudflare fork of zlib has a faster CRC function that
>>>> seems to be worth about 20%)
>>> 
>>> From C, this will not be fun because Windows.
>>> 
>>> Half serious question: what are your thoughts on writing this in Rust? We
>>> can write a Rust library that provides a C ABI which Python can interface
>>> with (either via a Python C extension or via ctypes/cffi). Rust should be
>>> supported on all the platforms we care about. For building/distribution, we
>>> can provide wheels for Windows and OS X so `pip install` "just works" [and
>>> doesn't require a working compiler]. That leaves Linux and other Unixen.
>>> Most distributions these days offer a Rust package or are in the process of
>>> offering one. For the laggards, I assume we'll still have the pure Python
>>> fallback.
>> 
>> I like Rust, but I don't think their libraries and packaging stuff are mature
>> enough. For example, we'll need a non-std crate for zlib and its dependency
>> would be managed by cargo with an explicit version. And, it would be headache
>> to translate this sort of dependencies to system packages (e.g. .deb or .rpm).
>
> At the point we're using a faster zlib implementation, we'll likely have a zlib implementation vendored and statically linked. So, I think this concern translates to how how hard it is to support linking against the system zlib, if we want to support that at all.

Please, no rust. Building and packaging rust (even on mac) is a headache
for me and I'd rather not deal with it.

> FWIW, the way I see PyPy and Python 3 support playing out is we rewrite the existing C extensions code and produce a shared library that we call into from Python using ctypes/cffi. At that point, the underlying code could be C, C++, Rust, or anything else (or a mix thereof) capable of producing something that conforms to the C ABI.

C, please. I'm totally on board with a shared library.
Bryan O'Sullivan - Dec. 23, 2015, 8:51 p.m.
On Tue, Dec 22, 2015 at 11:30 PM, Gregory Szorc <gregory.szorc@gmail.com>
wrote:

> Half serious question: what are your thoughts on writing this in Rust?


We're still dragging around a ball and chain in the form of supporting a 7
year old version of Python, so any language that makes build and install
even more difficult for basically everyone (instead of whatever laggards
are still on 2.6) is a complete non-starter, regardless of any merits it
may have.
Matt Mackall - Dec. 23, 2015, 9:59 p.m.
On Tue, 2015-12-22 at 23:30 -0800, Gregory Szorc wrote:
> On Tue, Dec 22, 2015 at 9:41 PM, Matt Mackall <mpm@selenic.com> wrote:
> 
> > On Tue, 2015-12-22 at 17:27 -0800, Gregory Szorc wrote:
> > > https://www.mercurial-scm.org/wiki/BigRepositories has been updated
> > with a
> > > link to
> > > 
> > https://hg.mozilla.org/users/gszorc_mozilla.com/mozilla-central-aggressiveme
> > rg
> > > edeltas,
> > > which is a generaldelta clone of mozilla-central with
> > > format.aggressivemergedeltas enabled.
> > > 
> > > The last manifest delta chain in this repo is over 45,000 entries deep
> > and
> > > it makes for a good benchmark for testing revlog reading performance.
> > > 
> > > Remember: `hg clone --uncompressed` to preserve the delta chains from the
> > > server or your client will recompute them as part of applying the
> > > changegroup.
> > 
> > Without my threaded zlib hack:
> > 
> > $ hg perfmanifest 277045
> > ! wall 0.749929 comb 0.740000 user 0.730000 sys 0.010000 (best of 13)
> > 
> > (25% CPU usage on a CPU with 4 threads)
> > 
> > With my threaded zlib hack (threads = 4):
> > 
> > $ hg perfmanifest 277045
> > ! wall 0.480251 comb 1.090000 user 0.990000 sys 0.100000
> > (best of 20)
> > 
> > (50% CPU usage on a CPU with 4 threads)
> > 
> 
> Assuming 100% CPU usage, that's still ~240ms, which feels a bit steep. I
> think 100ms should be the upper limit.

That's not a particularly comfortable limit given:

$ hg debugdata -m 277045 | gzip -9 > a.gz
$ time gunzip < a.gz > /dev/null

real	0m0.142s
user	0m0.140s
sys	0m0.000s

That's only decompressing 4MB:

$ wc a.gz
  16267   89037 4110122 a.gz

(and is inherently hard to multithread)

But Mercurial wants to store chains up to 2x the uncompressed size:

$ gunzip < a.gz | wc
 130845  130854 12868485

So even with threading, that leaves very little room to achieve decent
compression, which very much depends on deltas.

> From C, this will not be fun because Windows.

Simple worker threads on Windows aren't all that painful.

> Half serious question: what are your thoughts on writing this in Rust?

Sanity check: Rust isn't even in Debian-unstable yet and we have an important
platform where getting a working C compiler is still a headache.

-- 
Mathematics is the supreme nostalgia of our time.
Gregory Szorc - Dec. 24, 2015, 2:33 a.m.
On Wed, Dec 23, 2015 at 1:59 PM, Matt Mackall <mpm@selenic.com> wrote:

> On Tue, 2015-12-22 at 23:30 -0800, Gregory Szorc wrote:
> > On Tue, Dec 22, 2015 at 9:41 PM, Matt Mackall <mpm@selenic.com> wrote:
> >
> > > On Tue, 2015-12-22 at 17:27 -0800, Gregory Szorc wrote:
> > > > https://www.mercurial-scm.org/wiki/BigRepositories has been updated
> > > with a
> > > > link to
> > > >
> > >
> https://hg.mozilla.org/users/gszorc_mozilla.com/mozilla-central-aggressiveme
> > > rg
> > > > edeltas,
> > > > which is a generaldelta clone of mozilla-central with
> > > > format.aggressivemergedeltas enabled.
> > > >
> > > > The last manifest delta chain in this repo is over 45,000 entries
> deep
> > > and
> > > > it makes for a good benchmark for testing revlog reading performance.
> > > >
> > > > Remember: `hg clone --uncompressed` to preserve the delta chains
> from the
> > > > server or your client will recompute them as part of applying the
> > > > changegroup.
> > >
> > > Without my threaded zlib hack:
> > >
> > > $ hg perfmanifest 277045
> > > ! wall 0.749929 comb 0.740000 user 0.730000 sys 0.010000 (best of 13)
> > >
> > > (25% CPU usage on a CPU with 4 threads)
> > >
> > > With my threaded zlib hack (threads = 4):
> > >
> > > $ hg perfmanifest 277045
> > > ! wall 0.480251 comb 1.090000 user 0.990000 sys 0.100000
> > > (best of 20)
> > >
> > > (50% CPU usage on a CPU with 4 threads)
> > >
> >
> > Assuming 100% CPU usage, that's still ~240ms, which feels a bit steep. I
> > think 100ms should be the upper limit.
>
> That's not a particularly comfortable limit given:
>
> $ hg debugdata -m 277045 | gzip -9 > a.gz
> $ time gunzip < a.gz > /dev/null
>
> real    0m0.142s
> user    0m0.140s
> sys     0m0.000s
>
> That's only decompressing 4MB:
>
> $ wc a.gz
>   16267   89037 4110122 a.gz
>
> (and is inherently hard to multithread)
>
> But Mercurial wants to store chains up to 2x the uncompressed size:
>
> $ gunzip < a.gz | wc
>  130845  130854 12868485
>
> So even with threading, that leaves very little room to achieve decent
> compression, which very much depends on deltas.
>
> > From C, this will not be fun because Windows.
>
> Simple worker threads on Windows aren't all that painful.
>
> > Half serious question: what are your thoughts on writing this in Rust?
>
> Sanity check: Rust isn't even in Debian-unstable yet


Apparently its in Debian testing. If nothing else, Firefox shipping a Rust
component should be a forcing function to get distributions to offer Rust.


> and we have an important
> platform where getting a working C compiler is still a headache.
>

Is this Windows?

Everyone's observations about the immaturity of Rust's packaging situation
are accurate. That being said, I argue that Rust's distribution situation
is *simpler* than Python's because there is no language runtime dependency
(Python). Yes, you still have shared library dependency issues, but that's
true of Python C extensions today.

Binary distribution of Mercurial *should* be a solved problem on Windows
and OS X, especially now that it looks like we can generate wheels properly
(I need to talk to someone about uploading wheels to PyPI for the 3.7
release).

Binary distribution on Unixen is more difficult. We can partially solve
that by publishing RPMs, debs, etc where needed. (I argue we should be
doing more of this since distros are lethargic about updating to the latest
Mercurial release.) We already have mechanisms to produce RPMs and debs
compatible with ancient distros (like CentOS 6). We even bundle Python 2.7
in some of them! I'd really prefer to stay out of the packaging game too.
But if distros are going to move at a glacial pace, I can argue we have a
responsibility to our users to provide them the opportunity to easily
install a modern Mercurial. I fear that means providing binary packages for
Unixen.

Source distribution on Unixen is just a PITA, both for Python C extensions
and Rust. I agree that Rust is behind Python here. This should change as
Rust's popularity increases. But it will take a while.
Sean Farley - Dec. 24, 2015, 4:38 a.m.
Gregory Szorc <gregory.szorc@gmail.com> writes:

> On Wed, Dec 23, 2015 at 1:59 PM, Matt Mackall <mpm@selenic.com> wrote:
>
>> On Tue, 2015-12-22 at 23:30 -0800, Gregory Szorc wrote:
>> > On Tue, Dec 22, 2015 at 9:41 PM, Matt Mackall <mpm@selenic.com> wrote:
>> >
>> > > On Tue, 2015-12-22 at 17:27 -0800, Gregory Szorc wrote:
>> > > > https://www.mercurial-scm.org/wiki/BigRepositories has been updated
>> > > with a
>> > > > link to
>> > > >
>> > >
>> https://hg.mozilla.org/users/gszorc_mozilla.com/mozilla-central-aggressiveme
>> > > rg
>> > > > edeltas,
>> > > > which is a generaldelta clone of mozilla-central with
>> > > > format.aggressivemergedeltas enabled.
>> > > >
>> > > > The last manifest delta chain in this repo is over 45,000 entries
>> deep
>> > > and
>> > > > it makes for a good benchmark for testing revlog reading performance.
>> > > >
>> > > > Remember: `hg clone --uncompressed` to preserve the delta chains
>> from the
>> > > > server or your client will recompute them as part of applying the
>> > > > changegroup.
>> > >
>> > > Without my threaded zlib hack:
>> > >
>> > > $ hg perfmanifest 277045
>> > > ! wall 0.749929 comb 0.740000 user 0.730000 sys 0.010000 (best of 13)
>> > >
>> > > (25% CPU usage on a CPU with 4 threads)
>> > >
>> > > With my threaded zlib hack (threads = 4):
>> > >
>> > > $ hg perfmanifest 277045
>> > > ! wall 0.480251 comb 1.090000 user 0.990000 sys 0.100000
>> > > (best of 20)
>> > >
>> > > (50% CPU usage on a CPU with 4 threads)
>> > >
>> >
>> > Assuming 100% CPU usage, that's still ~240ms, which feels a bit steep. I
>> > think 100ms should be the upper limit.
>>
>> That's not a particularly comfortable limit given:
>>
>> $ hg debugdata -m 277045 | gzip -9 > a.gz
>> $ time gunzip < a.gz > /dev/null
>>
>> real    0m0.142s
>> user    0m0.140s
>> sys     0m0.000s
>>
>> That's only decompressing 4MB:
>>
>> $ wc a.gz
>>   16267   89037 4110122 a.gz
>>
>> (and is inherently hard to multithread)
>>
>> But Mercurial wants to store chains up to 2x the uncompressed size:
>>
>> $ gunzip < a.gz | wc
>>  130845  130854 12868485
>>
>> So even with threading, that leaves very little room to achieve decent
>> compression, which very much depends on deltas.
>>
>> > From C, this will not be fun because Windows.
>>
>> Simple worker threads on Windows aren't all that painful.
>>
>> > Half serious question: what are your thoughts on writing this in Rust?
>>
>> Sanity check: Rust isn't even in Debian-unstable yet
>
>
> Apparently its in Debian testing. If nothing else, Firefox shipping a Rust
> component should be a forcing function to get distributions to offer Rust.
>
>
>> and we have an important
>> platform where getting a working C compiler is still a headache.
>>
>
> Is this Windows?
>
> Everyone's observations about the immaturity of Rust's packaging situation
> are accurate. That being said, I argue that Rust's distribution situation
> is *simpler* than Python's because there is no language runtime dependency
> (Python). Yes, you still have shared library dependency issues, but that's
> true of Python C extensions today.
>
> Binary distribution of Mercurial *should* be a solved problem on Windows
> and OS X, especially now that it looks like we can generate wheels properly
> (I need to talk to someone about uploading wheels to PyPI for the 3.7
> release).
>
> Binary distribution on Unixen is more difficult. We can partially solve
> that by publishing RPMs, debs, etc where needed. (I argue we should be
> doing more of this since distros are lethargic about updating to the latest
> Mercurial release.) We already have mechanisms to produce RPMs and debs
> compatible with ancient distros (like CentOS 6). We even bundle Python 2.7
> in some of them! I'd really prefer to stay out of the packaging game too.
> But if distros are going to move at a glacial pace, I can argue we have a
> responsibility to our users to provide them the opportunity to easily
> install a modern Mercurial. I fear that means providing binary packages for
> Unixen.
>
> Source distribution on Unixen is just a PITA, both for Python C extensions
> and Rust. I agree that Rust is behind Python here. This should change as
> Rust's popularity increases. But it will take a while.

That being said, can we un-bitrot Greg Ward's C code?

https://bitbucket.org/gward/xrevlog

Patch

diff -r 3dea4eae4eeb -r b56bc1676b5d mercurial/revlog.py
--- a/mercurial/revlog.py	Mon Dec 21 14:52:18 2015 -0600
+++ b/mercurial/revlog.py	Mon Dec 21 13:58:41 2015 -0600
@@ -17,6 +17,8 @@ 
 import errno
 import os
 import struct
+import threading
+import Queue
 import zlib
 
 # import stuff from node for others to import from revlog
@@ -1132,14 +1134,38 @@ 
             # 2G on Windows
             return [self._chunk(rev, df=df) for rev in revs]
 
-        for rev in revs:
+        slots = [None] * len(revs)
+
+        work = []
+        done = Queue.Queue()
+
+        for slot, rev in enumerate(revs):
             chunkstart = start(rev)
             if inline:
                 chunkstart += (rev + 1) * iosize
             chunklength = length(rev)
-            ladd(decompress(buffer(data, chunkstart - offset, chunklength)))
+            buf = buffer(data, chunkstart - offset, chunklength)
+            if buf and buf[0] == 'x':
+                work.append((slot, buf))
+            else:
+                slots[slot] = decompress(buf)
 
-        return l
+        def worker():
+            try:
+                while True:
+                    slot, buf = work.pop()
+                    slots[slot] = _decompress(buf)
+            except:
+                done.put(1)
+
+        tcount = 4
+        for w in xrange(tcount - 1):
+            threading.Thread(target=worker).start()
+        worker()
+        for w in xrange(tcount):
+            done.get()
+
+        return slots
 
     def _chunkclear(self):
         """Clear the raw chunk cache."""