Patchwork [3,of,3] revlog: remove cache validation

login
register
mail settings
Submitter Gregory Szorc
Date Nov. 23, 2015, 3:34 a.m.
Message ID <3818d1ea14c24c73f239.1448249650@ubuntu-main>
Download mbox | patch
Permalink /patch/11590/
State Deferred
Delegated to: Pierre-Yves David
Headers show

Comments

Gregory Szorc - Nov. 23, 2015, 3:34 a.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1448243855 28800
#      Sun Nov 22 17:57:35 2015 -0800
# Node ID 3818d1ea14c24c73f23925dcee21d0f71fb7a8b5
# Parent  9b10f53e1f1a8ab3850577610498dff590ee8357
revlog: remove cache validation

Previously, we likely called _chunkraw() multiple times in order to
ensure it didn't change out from under us. I'm pretty certain this code
had its origins in the days where we attempted to have thread safety of
localrepository and thus revlog instances.

revlog instances are already not thread safe for writing. And, as of
Mercurial 3.6, hgweb uses a separate localrepository instance per
request, so there is no thread contention. We more or less decided
that attempting to make classes like revlog thread safe is a lost
cause.

So, this patch removes thread safety from _chunks. As a result, we make
one less call into _chunkraw() when the initial read isn't serviced
by the cache. This translates to savings of 4 function calls overall
and possibly prevents the creation of an additional buffer view into the
cache. I doubt this translates into any real world performance wins
because decompression will almost certainly dwarf time spent in
_chunks().
Pierre-Yves David - Nov. 23, 2015, 4:40 a.m.
On 11/22/2015 07:34 PM, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1448243855 28800
> #      Sun Nov 22 17:57:35 2015 -0800
> # Node ID 3818d1ea14c24c73f23925dcee21d0f71fb7a8b5
> # Parent  9b10f53e1f1a8ab3850577610498dff590ee8357
> revlog: remove cache validation
>
> Previously, we likely called _chunkraw() multiple times in order to
> ensure it didn't change out from under us. I'm pretty certain this code
> had its origins in the days where we attempted to have thread safety of
> localrepository and thus revlog instances.
>
> revlog instances are already not thread safe for writing. And, as of
> Mercurial 3.6, hgweb uses a separate localrepository instance per
> request, so there is no thread contention. We more or less decided
> that attempting to make classes like revlog thread safe is a lost
> cause.
>
> So, this patch removes thread safety from _chunks. As a result, we make
> one less call into _chunkraw() when the initial read isn't serviced
> by the cache. This translates to savings of 4 function calls overall
> and possibly prevents the creation of an additional buffer view into the
> cache. I doubt this translates into any real world performance wins
> because decompression will almost certainly dwarf time spent in
> _chunks().


The series looks good, but why are we doing all this if this the win is 
unlikely to be relevant?
Gregory Szorc - Nov. 23, 2015, 4:43 a.m.
On Sun, Nov 22, 2015 at 8:40 PM, Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 11/22/2015 07:34 PM, Gregory Szorc wrote:
>
>> # HG changeset patch
>> # User Gregory Szorc <gregory.szorc@gmail.com>
>> # Date 1448243855 28800
>> #      Sun Nov 22 17:57:35 2015 -0800
>> # Node ID 3818d1ea14c24c73f23925dcee21d0f71fb7a8b5
>> # Parent  9b10f53e1f1a8ab3850577610498dff590ee8357
>> revlog: remove cache validation
>>
>> Previously, we likely called _chunkraw() multiple times in order to
>> ensure it didn't change out from under us. I'm pretty certain this code
>> had its origins in the days where we attempted to have thread safety of
>> localrepository and thus revlog instances.
>>
>> revlog instances are already not thread safe for writing. And, as of
>> Mercurial 3.6, hgweb uses a separate localrepository instance per
>> request, so there is no thread contention. We more or less decided
>> that attempting to make classes like revlog thread safe is a lost
>> cause.
>>
>> So, this patch removes thread safety from _chunks. As a result, we make
>> one less call into _chunkraw() when the initial read isn't serviced
>> by the cache. This translates to savings of 4 function calls overall
>> and possibly prevents the creation of an additional buffer view into the
>> cache. I doubt this translates into any real world performance wins
>> because decompression will almost certainly dwarf time spent in
>> _chunks().
>>
>
>
> The series looks good, but why are we doing all this if this the win is
> unlikely to be relevant?
>

I have a few more patches queued up. Starting with a fix for issue4961. I'm
trying to make revlog reading significantly faster by employing cache
magic. I'm holding off on the 4961 fix because I want to be sure my
followup work doesn't necessitate additional refactoring.
Pierre-Yves David - Nov. 23, 2015, 5:14 a.m.
On 11/22/2015 08:43 PM, Gregory Szorc wrote:
> On Sun, Nov 22, 2015 at 8:40 PM, Pierre-Yves David
> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
> wrote:
>
>
>
>     On 11/22/2015 07:34 PM, Gregory Szorc wrote:
>
>         # HG changeset patch
>         # User Gregory Szorc <gregory.szorc@gmail.com
>         <mailto:gregory.szorc@gmail.com>>
>         # Date 1448243855 28800
>         #      Sun Nov 22 17:57:35 2015 -0800
>         # Node ID 3818d1ea14c24c73f23925dcee21d0f71fb7a8b5
>         # Parent  9b10f53e1f1a8ab3850577610498dff590ee8357
>         revlog: remove cache validation
>
>         Previously, we likely called _chunkraw() multiple times in order to
>         ensure it didn't change out from under us. I'm pretty certain
>         this code
>         had its origins in the days where we attempted to have thread
>         safety of
>         localrepository and thus revlog instances.
>
>         revlog instances are already not thread safe for writing. And, as of
>         Mercurial 3.6, hgweb uses a separate localrepository instance per
>         request, so there is no thread contention. We more or less decided
>         that attempting to make classes like revlog thread safe is a lost
>         cause.
>
>         So, this patch removes thread safety from _chunks. As a result,
>         we make
>         one less call into _chunkraw() when the initial read isn't serviced
>         by the cache. This translates to savings of 4 function calls overall
>         and possibly prevents the creation of an additional buffer view
>         into the
>         cache. I doubt this translates into any real world performance wins
>         because decompression will almost certainly dwarf time spent in
>         _chunks().
>
>
>
>     The series looks good, but why are we doing all this if this the win
>     is unlikely to be relevant?
>
>
> I have a few more patches queued up. Starting with a fix for issue4961.
> I'm trying to make revlog reading significantly faster by employing
> cache magic. I'm holding off on the 4961 fix because I want to be sure
> my followup work doesn't necessitate additional refactoring.

Okay, I think it make sense to ré-emit these two patches with a next 
batch, when they will provide actual benefits.

Patch

diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -1059,25 +1059,18 @@  class revlog(object):
         length = self.length
         inline = self._inline
         iosize = self._io.size
         buffer = util.buffer
 
         l = []
         ladd = l.append
 
-        # preload the cache
         try:
-            while True:
-                # ensure that the cache doesn't change out from under us
-                _cache = self._chunkcache
-                self._chunkraw(revs[0], revs[-1], df=df)[1]
-                if _cache == self._chunkcache:
-                    break
-            offset, data = _cache
+            offset, data = self._chunkraw(revs[0], revs[-1], df=df)
         except OverflowError:
             # issue4215 - we can't cache a run of chunks greater than
             # 2G on Windows
             return [self._chunk(rev, df=df) for rev in revs]
 
         for rev in revs:
             chunkstart = start(rev)
             if inline: