Patchwork [Bug,5006] New: Quadratic expense in no-op pull

login
register
mail settings
Submitter mercurial-bugs@selenic.com
Date Dec. 15, 2015, 7:29 p.m.
Message ID <bug-5006-285@https.bz.mercurial-scm.org/>
Download mbox | patch
Permalink /patch/12057/
State Not Applicable
Headers show

Comments

mercurial-bugs@selenic.com - Dec. 15, 2015, 7:29 p.m.
https://bz.mercurial-scm.org/show_bug.cgi?id=5006

            Bug ID: 5006
           Summary: Quadratic expense in no-op pull
           Product: Mercurial
           Version: default branch
          Hardware: PC
                OS: Linux
            Status: UNCONFIRMED
          Severity: feature
          Priority: wish
         Component: evolution
          Assignee: bugzilla@selenic.com
          Reporter: mpm@selenic.com
                CC: mercurial-devel@selenic.com,
                    pierre-yves.david@ens-lyon.org

When doing a no-op http pull between two copies of the main hg repo, I get the
following behavior:

searching for changes
no changes found
(70 seconds of silence)

Of that 70 seconds, 16 seconds is network activity (transferring all the
markers again) followed by 54 seconds of CPU activity that confuses my
profiler:

http://whatever.computer/pull2.txt

That turns out to be caused by ~2000 calls to mergemarkers with a batch of ~45
markers. That in turn calls self.add() which builds a set from all known
markers. Thus, the time is quadratic in the number of markers.

Here is my set of hacks to show the problem and cache the marker set. I have no
idea if this is correct, but it makes the 54 seconds disappear:

Patch

diff -r f403693d5f7c mercurial/obsolete.py
--- a/mercurial/obsolete.py     Sat Dec 12 21:36:21 2015 -0600
+++ b/mercurial/obsolete.py     Tue Dec 15 13:29:16 2015 -0600
@@ -148,6 +148,7 @@ 
 def _fm0readmarkers(data, off):
     # Loop on markers
     l = len(data)
+    res = []
     while off + _fm0fsize <= l:
         # read fixed part
         cur = data[off:off + _fm0fsize]
@@ -194,8 +195,10 @@ 
                 parents = None

         metadata = tuple(sorted(metadata.iteritems()))
+        res.append((pre, sucs, flags, metadata, date, parents))

-        yield (pre, sucs, flags, metadata, date, parents)
+    print "count", len(res)
+    return res

 def _fm0encodeonemarker(marker):
     pre, sucs, flags, metadata, date, parents = marker
@@ -523,6 +526,8 @@ 
         self.svfs = svfs
         self._version = defaultformat
         self._readonly = readonly
+        self._mergecount = 0
+        self._mergeknown = None

     def __iter__(self):
         return iter(self._all)
@@ -592,7 +597,11 @@ 
         if self._readonly:
             raise error.Abort('creating obsolete markers is not enabled on '
                               'this repo')
-        known = set(self._all)
+
+        if not self._mergeknown:
+            self._mergeknown = set(self._all)
+        known = self._mergeknown
+
         new = []
         for m in markers:
             if m not in known:
@@ -623,7 +632,11 @@ 

         Returns the number of new markers added."""
         version, markers = _readmarkers(data)
-        return self.add(transaction, markers)
+        print "start merge", self._mergecount
+        self._mergecount += 1
+        a = self.add(transaction, markers)
+        print "end", a
+        return a

     @propertycache
     def _all(self):