Patchwork [2,of,2] repoview: bypass changelog method to computed cache key

login
register
mail settings
Submitter Pierre-Yves David
Date Dec. 4, 2015, 11:06 p.m.
Message ID <43caa4b5b82cb482a595.1449270418@marginatus.alto.octopoid.net>
Download mbox | patch
Permalink /patch/11828/
State Accepted
Delegated to: Yuya Nishihara
Headers show

Comments

Pierre-Yves David - Dec. 4, 2015, 11:06 p.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@fb.com>
# Date 1449267735 28800
#      Fri Dec 04 14:22:15 2015 -0800
# Node ID 43caa4b5b82cb482a59513ee78b5bd96abefb7cd
# Parent  8daa6d98b6f67d39389bbd91d0a034466e1c965d
# EXP-Topic clcachekey
# Available At http://hg.netv6.net/marmoute-wip/mercurial/
#              hg pull http://hg.netv6.net/marmoute-wip/mercurial/ -r 43caa4b5b82c
repoview: bypass changelog method to computed cache key

Getting the data necessary for the cache key using the changelog/revlog method
adds a significant overhead. Given how simple the underlying implementation is
and often this code is ran, it make sense to violate layering and directly
compute the data.

Testing `hg log` on Mozilla-central, this reduce the time spent on changelog
cache validation by an extra half:

before: 12.2s of 69s
after:   6.1s of 62s

Total speed up from this patch and it's parent is 3x

(With stupid python profiler overhead)

The global speedup without profiler overhead is still there,

Before: 51s
After:  39s (-23%)
Pierre-Yves David - Dec. 4, 2015, 11:18 p.m.
On 12/04/2015 03:06 PM, Pierre-Yves David wrote:
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@fb.com>
> # Date 1449267735 28800
> #      Fri Dec 04 14:22:15 2015 -0800
> # Node ID 43caa4b5b82cb482a59513ee78b5bd96abefb7cd
> # Parent  8daa6d98b6f67d39389bbd91d0a034466e1c965d
> # EXP-Topic clcachekey
> # Available At http://hg.netv6.net/marmoute-wip/mercurial/
> #              hg pull http://hg.netv6.net/marmoute-wip/mercurial/ -r 43caa4b5b82c
> repoview: bypass changelog method to computed cache key
>
> Getting the data necessary for the cache key using the changelog/revlog method
> adds a significant overhead. Given how simple the underlying implementation is
> and often this code is ran, it make sense to violate layering and directly
> compute the data.
>
> Testing `hg log` on Mozilla-central, this reduce the time spent on changelog
> cache validation by an extra half:
>
> before: 12.2s of 69s
> after:   6.1s of 62s
>
> Total speed up from this patch and it's parent is 3x
>
> (With stupid python profiler overhead)
>
> The global speedup without profiler overhead is still there,
>
> Before: 51s
> After:  39s (-23%)

Disabling my various extensions playing with namespace, the number move to

Before: 33.680
After:  29.290 (-15%)
Pierre-Yves David - Dec. 5, 2015, 12:07 a.m.
On 12/04/2015 03:18 PM, Pierre-Yves David wrote:
>
>
> On 12/04/2015 03:06 PM, Pierre-Yves David wrote:
>> # HG changeset patch
>> # User Pierre-Yves David <pierre-yves.david@fb.com>
>> # Date 1449267735 28800
>> #      Fri Dec 04 14:22:15 2015 -0800
>> # Node ID 43caa4b5b82cb482a59513ee78b5bd96abefb7cd
>> # Parent  8daa6d98b6f67d39389bbd91d0a034466e1c965d
>> # EXP-Topic clcachekey
>> # Available At http://hg.netv6.net/marmoute-wip/mercurial/
>> #              hg pull http://hg.netv6.net/marmoute-wip/mercurial/ -r
>> 43caa4b5b82c
>> repoview: bypass changelog method to computed cache key
>>
>> Getting the data necessary for the cache key using the
>> changelog/revlog method
>> adds a significant overhead. Given how simple the underlying
>> implementation is
>> and often this code is ran, it make sense to violate layering and
>> directly
>> compute the data.
>>
>> Testing `hg log` on Mozilla-central, this reduce the time spent on
>> changelog
>> cache validation by an extra half:
>>
>> before: 12.2s of 69s
>> after:   6.1s of 62s
>>
>> Total speed up from this patch and it's parent is 3x
>>
>> (With stupid python profiler overhead)
>>
>> The global speedup without profiler overhead is still there,
>>
>> Before: 51s
>> After:  39s (-23%)
>
> Disabling my various extensions playing with namespace, the number move to
>
> Before: 33.680
> After:  29.290 (-15%)

I've ran the revsets benchmark on the "author" revsets, speed up of 14% 
to 20% is confirmed there too.
Yuya Nishihara - Dec. 6, 2015, 2:50 p.m.
On Fri, 04 Dec 2015 15:06:58 -0800, Pierre-Yves David wrote:
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@fb.com>
> # Date 1449267735 28800
> #      Fri Dec 04 14:22:15 2015 -0800
> # Node ID 43caa4b5b82cb482a59513ee78b5bd96abefb7cd
> # Parent  8daa6d98b6f67d39389bbd91d0a034466e1c965d
> # EXP-Topic clcachekey
> # Available At http://hg.netv6.net/marmoute-wip/mercurial/
> #              hg pull http://hg.netv6.net/marmoute-wip/mercurial/ -r 43caa4b5b82c
> repoview: bypass changelog method to computed cache key
> 
> Getting the data necessary for the cache key using the changelog/revlog method
> adds a significant overhead. Given how simple the underlying implementation is
> and often this code is ran, it make sense to violate layering and directly
> compute the data.

Sounds reasonable. Pushed them to the clowncopter, thanks.

Patch

diff --git a/mercurial/repoview.py b/mercurial/repoview.py
--- a/mercurial/repoview.py
+++ b/mercurial/repoview.py
@@ -298,17 +298,20 @@  class repoview(object):
 
         this changelog must not be used for writing"""
         # some cache may be implemented later
         unfi = self._unfilteredrepo
         unfichangelog = unfi.changelog
+        # bypass call to changelog.method
+        unfiindex = unfichangelog.index
+        unfilen = len(unfiindex) - 1
+        unfinode = unfiindex[unfilen - 1][7]
+
         revs = filterrevs(unfi, self.filtername)
         cl = self._clcache
-        newkey = (len(unfichangelog), unfichangelog.tip(), hash(revs),
-                  unfichangelog._delayed)
-        if cl is not None:
-            if newkey != self._clcachekey:
-                cl = None
+        newkey = (unfilen, unfinode, hash(revs), unfichangelog._delayed)
+        if cl is not None and newkey != self._clcachekey:
+            cl = None
         # could have been made None by the previous if
         if cl is None:
             cl = copy.copy(unfichangelog)
             cl.filteredrevs = revs
             object.__setattr__(self, '_clcache', cl)