Patchwork [8,of,9,"] discovery: cache the children mapping used during each discovery

login
register
mail settings
Submitter Pierre-Yves David
Date March 5, 2019, 5:39 p.m.
Message ID <6127533daac7b331ba98.1551807559@nodosa.octopoid.net>
Download mbox | patch
Permalink /patch/39085/
State Accepted
Headers show

Comments

Pierre-Yves David - March 5, 2019, 5:39 p.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@octobus.net>
# Date 1551314900 -3600
#      Thu Feb 28 01:48:20 2019 +0100
# Node ID 6127533daac7b331ba980d70040d55ee938ce106
# Parent  fe57cbd053432d10bccd3955072a4c6c581e3355
# EXP-Topic discovery-speedup
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 6127533daac7
discovery: cache the children mapping used during each discovery

During discovery, the `undecided` set keep shrinking. Therefore, the map
computed for an iteration N will be valid for iteration N+1. Instead of
computing the same data over and over we cache it the first time.

Our private pathological case speed up from about 7.5 seconds to about 6.3
seconds.

(starting from over 70s at the start of the full series)
Yuya Nishihara - March 6, 2019, 8:56 p.m.
On Tue, 05 Mar 2019 18:39:19 +0100, Pierre-Yves David wrote:
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@octobus.net>
> # Date 1551314900 -3600
> #      Thu Feb 28 01:48:20 2019 +0100
> # Node ID 6127533daac7b331ba980d70040d55ee938ce106
> # Parent  fe57cbd053432d10bccd3955072a4c6c581e3355
> # EXP-Topic discovery-speedup
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 6127533daac7
> discovery: cache the children mapping used during each discovery
> 
> During discovery, the `undecided` set keep shrinking. Therefore, the map
> computed for an iteration N will be valid for iteration N+1. Instead of
> computing the same data over and over we cache it the first time.

Can you copy this to the function doc or inline-comment?

>      def addcommons(self, commons):
>          """registrer nodes known as common"""
> @@ -173,15 +174,14 @@ class partialdiscovery(object):
>  
>      def _childrengetter(self, revs):
>  
> +        if self._childrenmap is not None:
> +            return self._childrenmap.__getitem__

Without the comment above, this code would look invalid since our cache
strategy ignores the revs argument (or relying on the fact that undecided
set would never grow.)

Patch

diff --git a/mercurial/setdiscovery.py b/mercurial/setdiscovery.py
--- a/mercurial/setdiscovery.py
+++ b/mercurial/setdiscovery.py
@@ -116,6 +116,7 @@  class partialdiscovery(object):
         self._common = repo.changelog.incrementalmissingrevs()
         self._undecided = None
         self.missing = set()
+        self._childrenmap = None
 
     def addcommons(self, commons):
         """registrer nodes known as common"""
@@ -173,15 +174,14 @@  class partialdiscovery(object):
 
     def _childrengetter(self, revs):
 
+        if self._childrenmap is not None:
+            return self._childrenmap.__getitem__
+
         # _updatesample() essentially does interaction over revisions to look
         # up their children. This lookup is expensive and doing it in a loop is
         # quadratic. We precompute the children for all relevant revisions and
         # make the lookup in _updatesample() a simple dict lookup.
-        #
-        # Because this function can be called multiple times during discovery,
-        # we may still perform redundant work and there is room to optimize
-        # this by keeping a persistent cache of children across invocations.
-        children = {}
+        self._childrenmap = children = {}
 
         parentrevs = self._parentsgetter()