Patchwork [4,of,9,"] discovery: use a lower level but faster way to retrieve parents

login
register
mail settings
Submitter Pierre-Yves David
Date March 5, 2019, 5:39 p.m.
Message ID <ffd98d208aa7f92e13bf.1551807555@nodosa.octopoid.net>
Download mbox | patch
Permalink /patch/39083/
State Accepted
Headers show

Comments

Pierre-Yves David - March 5, 2019, 5:39 p.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@octobus.net>
# Date 1551311787 -3600
#      Thu Feb 28 00:56:27 2019 +0100
# Node ID ffd98d208aa7f92e13bf233b6d752cd2d292ebbe
# Parent  82035c1d714f8f3911632ea1271002745fc620f4
# EXP-Topic discovery-speedup
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r ffd98d208aa7
discovery: use a lower level but faster way to retrieve parents

We already know that no revision in the undecided set are filtered, so we can
skip multiple checks and directly access lower level data.

In a private pathological case, this improves the timing from about 70 seconds
to about 50 seconds. There are other actions to be taken to improve that case,
however this gives an idea of the general overhead.
Yuya Nishihara - March 6, 2019, 8:35 p.m.
On Tue, 05 Mar 2019 18:39:15 +0100, Pierre-Yves David wrote:
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@octobus.net>
> # Date 1551311787 -3600
> #      Thu Feb 28 00:56:27 2019 +0100
> # Node ID ffd98d208aa7f92e13bf233b6d752cd2d292ebbe
> # Parent  82035c1d714f8f3911632ea1271002745fc620f4
> # EXP-Topic discovery-speedup
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r ffd98d208aa7
> discovery: use a lower level but faster way to retrieve parents

> diff --git a/mercurial/setdiscovery.py b/mercurial/setdiscovery.py
> --- a/mercurial/setdiscovery.py
> +++ b/mercurial/setdiscovery.py
> @@ -165,6 +165,12 @@ class partialdiscovery(object):
>          # common.bases and all its ancestors
>          return self._common.basesheads()
>  
> +    def _parentsgetter(self):
> +        getrev = self._repo.changelog.index.__getitem__
> +        def getparents(r):
> +            return getrev(r)[5:6]
> +        return getparents

Queued, but it's probably better to use revlog._uncheckedparentrevs() than
peeking a raw index entry. Can you send a follow up?
Pierre-Yves David - March 7, 2019, 12:21 p.m.
On 3/6/19 9:35 PM, Yuya Nishihara wrote:
> On Tue, 05 Mar 2019 18:39:15 +0100, Pierre-Yves David wrote:
>> # HG changeset patch
>> # User Pierre-Yves David <pierre-yves.david@octobus.net>
>> # Date 1551311787 -3600
>> #      Thu Feb 28 00:56:27 2019 +0100
>> # Node ID ffd98d208aa7f92e13bf233b6d752cd2d292ebbe
>> # Parent  82035c1d714f8f3911632ea1271002745fc620f4
>> # EXP-Topic discovery-speedup
>> # Available At https://bitbucket.org/octobus/mercurial-devel/
>> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r ffd98d208aa7
>> discovery: use a lower level but faster way to retrieve parents
> 
>> diff --git a/mercurial/setdiscovery.py b/mercurial/setdiscovery.py
>> --- a/mercurial/setdiscovery.py
>> +++ b/mercurial/setdiscovery.py
>> @@ -165,6 +165,12 @@ class partialdiscovery(object):
>>           # common.bases and all its ancestors
>>           return self._common.basesheads()
>>   
>> +    def _parentsgetter(self):
>> +        getrev = self._repo.changelog.index.__getitem__
>> +        def getparents(r):
>> +            return getrev(r)[5:6]
>> +        return getparents
> 
> Queued, but it's probably better to use revlog._uncheckedparentrevs() than
> peeking a raw index entry. Can you send a follow up?

Unfortunately, the overhead of _uncheckedparentrevs is still quite 
significant. (Attribute access and exception handling)

before:               66.831393 seconds (median of 3)
my-change:            49.357107 seconds (median of 3)
_uncheckedparentrevs: 58.941629 seconds (median of 3)

Cheers,
Gregory Szorc - March 7, 2019, 5:54 p.m.
On Thu, Mar 7, 2019 at 4:23 AM Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 3/6/19 9:35 PM, Yuya Nishihara wrote:
> > On Tue, 05 Mar 2019 18:39:15 +0100, Pierre-Yves David wrote:
> >> # HG changeset patch
> >> # User Pierre-Yves David <pierre-yves.david@octobus.net>
> >> # Date 1551311787 -3600
> >> #      Thu Feb 28 00:56:27 2019 +0100
> >> # Node ID ffd98d208aa7f92e13bf233b6d752cd2d292ebbe
> >> # Parent  82035c1d714f8f3911632ea1271002745fc620f4
> >> # EXP-Topic discovery-speedup
> >> # Available At https://bitbucket.org/octobus/mercurial-devel/
> >> #              hg pull https://bitbucket.org/octobus/mercurial-devel/
> -r ffd98d208aa7
> >> discovery: use a lower level but faster way to retrieve parents
> >
> >> diff --git a/mercurial/setdiscovery.py b/mercurial/setdiscovery.py
> >> --- a/mercurial/setdiscovery.py
> >> +++ b/mercurial/setdiscovery.py
> >> @@ -165,6 +165,12 @@ class partialdiscovery(object):
> >>           # common.bases and all its ancestors
> >>           return self._common.basesheads()
> >>
> >> +    def _parentsgetter(self):
> >> +        getrev = self._repo.changelog.index.__getitem__
> >> +        def getparents(r):
> >> +            return getrev(r)[5:6]
> >> +        return getparents
> >
> > Queued, but it's probably better to use revlog._uncheckedparentrevs()
> than
> > peeking a raw index entry. Can you send a follow up?
>
> Unfortunately, the overhead of _uncheckedparentrevs is still quite
> significant. (Attribute access and exception handling)
>
> before:               66.831393 seconds (median of 3)
> my-change:            49.357107 seconds (median of 3)
> _uncheckedparentrevs: 58.941629 seconds (median of 3)
>

These timings are bonkers and I am quite surprised that attribute access
and exception handling is adding that much overhead! Are we sure there
isn't a `repo.changelog` in a tight loop? Or maybe we're being grossly
inefficient about DAG traversals and missing some caching opportunities?

I agree with Yuya that accessing the raw index entries is undesirable. This
extremely low-level interface will almost certainly not be supported when
we establish a formal storage interface for the changelog. If there are
performance concerns here, we *really* need an inline comment to that
effect. And it would be great if it pointed to a perf* command that could
help measure.

I'll accept this. But without an inline comment and a documented way to
reproduce the performance problem, there's a very good chance we regress
this optimization when establishing a formal interface for changelog.
Please follow up to help mitigate this possibility.
Gregory Szorc - March 7, 2019, 6:03 p.m.
On Thu, Mar 7, 2019 at 9:54 AM Gregory Szorc <gregory.szorc@gmail.com>
wrote:

>
> On Thu, Mar 7, 2019 at 4:23 AM Pierre-Yves David <
> pierre-yves.david@ens-lyon.org> wrote:
>
>>
>>
>> On 3/6/19 9:35 PM, Yuya Nishihara wrote:
>> > On Tue, 05 Mar 2019 18:39:15 +0100, Pierre-Yves David wrote:
>> >> # HG changeset patch
>> >> # User Pierre-Yves David <pierre-yves.david@octobus.net>
>> >> # Date 1551311787 -3600
>> >> #      Thu Feb 28 00:56:27 2019 +0100
>> >> # Node ID ffd98d208aa7f92e13bf233b6d752cd2d292ebbe
>> >> # Parent  82035c1d714f8f3911632ea1271002745fc620f4
>> >> # EXP-Topic discovery-speedup
>> >> # Available At https://bitbucket.org/octobus/mercurial-devel/
>> >> #              hg pull https://bitbucket.org/octobus/mercurial-devel/
>> -r ffd98d208aa7
>> >> discovery: use a lower level but faster way to retrieve parents
>> >
>> >> diff --git a/mercurial/setdiscovery.py b/mercurial/setdiscovery.py
>> >> --- a/mercurial/setdiscovery.py
>> >> +++ b/mercurial/setdiscovery.py
>> >> @@ -165,6 +165,12 @@ class partialdiscovery(object):
>> >>           # common.bases and all its ancestors
>> >>           return self._common.basesheads()
>> >>
>> >> +    def _parentsgetter(self):
>> >> +        getrev = self._repo.changelog.index.__getitem__
>> >> +        def getparents(r):
>> >> +            return getrev(r)[5:6]
>> >> +        return getparents
>> >
>> > Queued, but it's probably better to use revlog._uncheckedparentrevs()
>> than
>> > peeking a raw index entry. Can you send a follow up?
>>
>> Unfortunately, the overhead of _uncheckedparentrevs is still quite
>> significant. (Attribute access and exception handling)
>>
>> before:               66.831393 seconds (median of 3)
>> my-change:            49.357107 seconds (median of 3)
>> _uncheckedparentrevs: 58.941629 seconds (median of 3)
>>
>
> These timings are bonkers and I am quite surprised that attribute access
> and exception handling is adding that much overhead! Are we sure there
> isn't a `repo.changelog` in a tight loop? Or maybe we're being grossly
> inefficient about DAG traversals and missing some caching opportunities?
>
> I agree with Yuya that accessing the raw index entries is undesirable.
> This extremely low-level interface will almost certainly not be supported
> when we establish a formal storage interface for the changelog. If there
> are performance concerns here, we *really* need an inline comment to that
> effect. And it would be great if it pointed to a perf* command that could
> help measure.
>
> I'll accept this. But without an inline comment and a documented way to
> reproduce the performance problem, there's a very good chance we regress
> this optimization when establishing a formal interface for changelog.
> Please follow up to help mitigate this possibility.
>

... and the very next changeset in this series (c98420914c1 on committed)
says "on our private pathological case, this speeds things up from about 53
seconds to about 7.5 seconds." And 5baf06d2bb4 drops to 6.3s. So it was
indeed a very inefficient algorithm plus lack of caching contributing to
extreme amounts of overhead.

Is this micro optimization needed by the end state of the series? If not,
perhaps it should be reverted or my original comment asking for a follow-up
should be ignored. Whatever the outcome, a perf* command or more details on
how to simulate this performance problem (such as the number of
changesets/heads involved) would really help. Even 6.3s for discovery feels
a bit steep and if we're serious about scaling, we need to be presented
with esoteric use cases like your private repo to consider alternate ways
of discovery.
Pierre-Yves David - March 7, 2019, 6:05 p.m.
On 3/7/19 6:54 PM, Gregory Szorc wrote:
> 
> On Thu, Mar 7, 2019 at 4:23 AM Pierre-Yves David 
> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>> 
> wrote:
> 
> 
> 
>     On 3/6/19 9:35 PM, Yuya Nishihara wrote:
>      > On Tue, 05 Mar 2019 18:39:15 +0100, Pierre-Yves David wrote:
>      >> # HG changeset patch
>      >> # User Pierre-Yves David <pierre-yves.david@octobus.net
>     <mailto:pierre-yves.david@octobus.net>>
>      >> # Date 1551311787 -3600
>      >> #      Thu Feb 28 00:56:27 2019 +0100
>      >> # Node ID ffd98d208aa7f92e13bf233b6d752cd2d292ebbe
>      >> # Parent  82035c1d714f8f3911632ea1271002745fc620f4
>      >> # EXP-Topic discovery-speedup
>      >> # Available At https://bitbucket.org/octobus/mercurial-devel/
>      >> #              hg pull
>     https://bitbucket.org/octobus/mercurial-devel/ -r ffd98d208aa7
>      >> discovery: use a lower level but faster way to retrieve parents
>      >
>      >> diff --git a/mercurial/setdiscovery.py b/mercurial/setdiscovery.py
>      >> --- a/mercurial/setdiscovery.py
>      >> +++ b/mercurial/setdiscovery.py
>      >> @@ -165,6 +165,12 @@ class partialdiscovery(object):
>      >>           # common.bases and all its ancestors
>      >>           return self._common.basesheads()
>      >>
>      >> +    def _parentsgetter(self):
>      >> +        getrev = self._repo.changelog.index.__getitem__
>      >> +        def getparents(r):
>      >> +            return getrev(r)[5:6]
>      >> +        return getparents
>      >
>      > Queued, but it's probably better to use
>     revlog._uncheckedparentrevs() than
>      > peeking a raw index entry. Can you send a follow up?
> 
>     Unfortunately, the overhead of _uncheckedparentrevs is still quite
>     significant. (Attribute access and exception handling)
> 
>     before:               66.831393 seconds (median of 3)
>     my-change:            49.357107 seconds (median of 3)
>     _uncheckedparentrevs: 58.941629 seconds (median of 3)
> 
> 
> These timings are bonkers and I am quite surprised that attribute access 
> and exception handling is adding that much overhead! Are we sure there 
> isn't a `repo.changelog` in a tight loop? Or maybe we're being grossly 
> inefficient about DAG traversals and missing some caching opportunities?

I used the very same code has the current changesets but only changed 
the _parentgetter function to:

+    def _parentsgetter(self):
+        return self._repo.changelog._uncheckedparentrevs
+
Yuya Nishihara - March 7, 2019, 11:09 p.m.
On Thu, 7 Mar 2019 10:03:54 -0800, Gregory Szorc wrote:
> On Thu, Mar 7, 2019 at 9:54 AM Gregory Szorc <gregory.szorc@gmail.com>
> wrote:
> 
> >
> > On Thu, Mar 7, 2019 at 4:23 AM Pierre-Yves David <
> > pierre-yves.david@ens-lyon.org> wrote:
> >
> >>
> >>
> >> On 3/6/19 9:35 PM, Yuya Nishihara wrote:
> >> > On Tue, 05 Mar 2019 18:39:15 +0100, Pierre-Yves David wrote:
> >> >> # HG changeset patch
> >> >> # User Pierre-Yves David <pierre-yves.david@octobus.net>
> >> >> # Date 1551311787 -3600
> >> >> #      Thu Feb 28 00:56:27 2019 +0100
> >> >> # Node ID ffd98d208aa7f92e13bf233b6d752cd2d292ebbe
> >> >> # Parent  82035c1d714f8f3911632ea1271002745fc620f4
> >> >> # EXP-Topic discovery-speedup
> >> >> # Available At https://bitbucket.org/octobus/mercurial-devel/
> >> >> #              hg pull https://bitbucket.org/octobus/mercurial-devel/
> >> -r ffd98d208aa7
> >> >> discovery: use a lower level but faster way to retrieve parents
> >> >
> >> >> diff --git a/mercurial/setdiscovery.py b/mercurial/setdiscovery.py
> >> >> --- a/mercurial/setdiscovery.py
> >> >> +++ b/mercurial/setdiscovery.py
> >> >> @@ -165,6 +165,12 @@ class partialdiscovery(object):
> >> >>           # common.bases and all its ancestors
> >> >>           return self._common.basesheads()
> >> >>
> >> >> +    def _parentsgetter(self):
> >> >> +        getrev = self._repo.changelog.index.__getitem__
> >> >> +        def getparents(r):
> >> >> +            return getrev(r)[5:6]
> >> >> +        return getparents
> >> >
> >> > Queued, but it's probably better to use revlog._uncheckedparentrevs()
> >> than
> >> > peeking a raw index entry. Can you send a follow up?
> >>
> >> Unfortunately, the overhead of _uncheckedparentrevs is still quite
> >> significant. (Attribute access and exception handling)
> >>
> >> before:               66.831393 seconds (median of 3)
> >> my-change:            49.357107 seconds (median of 3)
> >> _uncheckedparentrevs: 58.941629 seconds (median of 3)
> >>
> >
> > These timings are bonkers and I am quite surprised that attribute access
> > and exception handling is adding that much overhead! Are we sure there
> > isn't a `repo.changelog` in a tight loop? Or maybe we're being grossly
> > inefficient about DAG traversals and missing some caching opportunities?
> >
> > I agree with Yuya that accessing the raw index entries is undesirable.
> > This extremely low-level interface will almost certainly not be supported
> > when we establish a formal storage interface for the changelog. If there
> > are performance concerns here, we *really* need an inline comment to that
> > effect. And it would be great if it pointed to a perf* command that could
> > help measure.
> >
> > I'll accept this. But without an inline comment and a documented way to
> > reproduce the performance problem, there's a very good chance we regress
> > this optimization when establishing a formal interface for changelog.
> > Please follow up to help mitigate this possibility.
> >
> 
> ... and the very next changeset in this series (c98420914c1 on committed)
> says "on our private pathological case, this speeds things up from about 53
> seconds to about 7.5 seconds." And 5baf06d2bb4 drops to 6.3s. So it was
> indeed a very inefficient algorithm plus lack of caching contributing to
> extreme amounts of overhead.
> 
> Is this micro optimization needed by the end state of the series? If not,
> perhaps it should be reverted or my original comment asking for a follow-up
> should be ignored. Whatever the outcome, a perf* command or more details on
> how to simulate this performance problem (such as the number of
> changesets/heads involved) would really help. Even 6.3s for discovery feels
> a bit steep and if we're serious about scaling, we need to be presented
> with esoteric use cases like your private repo to consider alternate ways
> of discovery.

I'm also curious about that.

FWIW, if the cost of running Python matters, maybe we can slightly optimize
_uncheckedparentrevs() assuming the given rev is a valid revlog revision.

In [13]: dis.dis(getparents)
  4           0 LOAD_DEREF               0 (getrev)
              3 LOAD_FAST                0 (r)
              6 CALL_FUNCTION            1
              9 LOAD_CONST               1 (5)
             12 LOAD_CONST               2 (6)
             15 SLICE+3
             16 RETURN_VALUE

Micro optimized: return self.index[rev][5:6]

[4]: dis.dis(repo.changelog._uncheckedparentrevs)
715           0 LOAD_FAST                0 (self)
              3 LOAD_ATTR                0 (index)
                                  # extra dict lookup compared to getparents()
              6 LOAD_FAST                1 (rev)
              9 BINARY_SUBSCR
             10 LOAD_CONST               1 (5)
             13 LOAD_CONST               2 (6)
             16 SLICE+3
             17 RETURN_VALUE

Original:
In [9]: dis.dis(repo.changelog._uncheckedparentrevs)
704           0 SETUP_EXCEPT            17 (to 20)

705           3 LOAD_FAST                0 (self)
              6 LOAD_ATTR                0 (index)
              9 LOAD_FAST                1 (rev)
             12 BINARY_SUBSCR
             13 STORE_FAST               2 (entry)
             16 POP_BLOCK
             17 JUMP_FORWARD            44 (to 64)

706     >>   20 DUP_TOP
             21 LOAD_GLOBAL              1 (IndexError)
             24 COMPARE_OP              10 (exception match)
             27 POP_JUMP_IF_FALSE       63
             30 POP_TOP
             31 POP_TOP
             32 POP_TOP
707          33 LOAD_FAST                1 (rev)
             36 LOAD_GLOBAL              2 (wdirrev)
             39 COMPARE_OP               2 (==)
             42 POP_JUMP_IF_FALSE       57
708          45 LOAD_GLOBAL              3 (error)
             48 LOAD_ATTR                4 (WdirUnsupported)
             51 RAISE_VARARGS            1
             54 JUMP_FORWARD             0 (to 57)
709     >>   57 RAISE_VARARGS            0
             60 JUMP_FORWARD             1 (to 64)
        >>   63 END_FINALLY

711     >>   64 LOAD_FAST                2 (entry)
             67 LOAD_CONST               1 (5)
             70 BINARY_SUBSCR
             71 LOAD_FAST                2 (entry)
             74 LOAD_CONST               2 (6)
             77 BINARY_SUBSCR
             78 BUILD_TUPLE              2
             81 RETURN_VALUE

Patch

diff --git a/mercurial/setdiscovery.py b/mercurial/setdiscovery.py
--- a/mercurial/setdiscovery.py
+++ b/mercurial/setdiscovery.py
@@ -165,6 +165,12 @@  class partialdiscovery(object):
         # common.bases and all its ancestors
         return self._common.basesheads()
 
+    def _parentsgetter(self):
+        getrev = self._repo.changelog.index.__getitem__
+        def getparents(r):
+            return getrev(r)[5:6]
+        return getparents
+
     def takequicksample(self, headrevs, size):
         """takes a quick sample of size <size>
 
@@ -181,7 +187,7 @@  class partialdiscovery(object):
         if len(sample) >= size:
             return _limitsample(sample, size)
 
-        _updatesample(None, headrevs, sample, self._repo.changelog.parentrevs,
+        _updatesample(None, headrevs, sample, self._parentsgetter(),
                       quicksamplesize=size)
         return sample
 
@@ -191,10 +197,11 @@  class partialdiscovery(object):
             return list(revs)
         repo = self._repo
         sample = set(repo.revs('heads(%ld)', revs))
+        parentrevs = self._parentsgetter()
 
         # update from heads
         revsheads = sample.copy()
-        _updatesample(revs, revsheads, sample, repo.changelog.parentrevs)
+        _updatesample(revs, revsheads, sample, parentrevs)
 
         # update from roots
         revsroots = set(repo.revs('roots(%ld)', revs))
@@ -209,7 +216,6 @@  class partialdiscovery(object):
         # this by keeping a persistent cache of children across invocations.
         children = {}
 
-        parentrevs = repo.changelog.parentrevs
         for rev in repo.changelog.revs(start=min(revsroots)):
             # Always ensure revision has an entry so we don't need to worry
             # about missing keys.