Patchwork [3,of,4] changelog: optionally record seen files when adding revisions

login
register
mail settings
Submitter Gregory Szorc
Date July 10, 2015, 12:09 a.m.
Message ID <c9abd93973708d02d7a9.1436486940@gps-mbp.local>
Download mbox | patch
Permalink /patch/9947/
State Changes Requested
Headers show

Comments

Gregory Szorc - July 10, 2015, 12:09 a.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1436485116 25200
#      Thu Jul 09 16:38:36 2015 -0700
# Node ID c9abd93973708d02d7a945cbddaba9b420a18cfa
# Parent  c4c7e9382652ddcbfc53ffc4fd00673c725068c3
changelog: optionally record seen files when adding revisions

Measurements revealed that computing the set of changed files from
addchangegroup can consume a non-trivial amount of CPU and can result in
a multi-second pause between applying changesets and manifests.

This patch enables the changelog to optionally record the set of seen
files as revisions are added to it. The next patch will explain this in
more detail.

While we did introduce an extra function call to adding changelog
entries, measurements reveal no measureable impact to mozilla-central
unbundle times. My guess is the function call overhead is dominated by
the fact that Python is interacting with a few hundred megabytes of
data.

Patch

diff --git a/mercurial/changelog.py b/mercurial/changelog.py
--- a/mercurial/changelog.py
+++ b/mercurial/changelog.py
@@ -135,8 +135,10 @@  class changelog(revlog.revlog):
         self._delayed = False
         self._delaybuf = None
         self._divert = False
         self.filteredrevs = frozenset()
+        # When set, tracks which files have been seen.
+        self.seenfiles = None
 
     def tip(self):
         """filtered version of revlog.tip"""
         for i in xrange(len(self) -1, -2, -1):
@@ -385,8 +387,20 @@  class changelog(revlog.revlog):
         l = [hex(manifest), user, parseddate] + sorted(files) + ["", desc]
         text = "\n".join(l)
         return self.addrevision(text, transaction, len(self), p1, p2)
 
+    def _addrevision(self, *args, **kwargs):
+        # We have separate calls to _addrevision because we want to avoid
+        # calculating returntext unless it is requested, since it isn't free.
+        if self.seenfiles is not None:
+            node, text = super(changelog, self)._addrevision(returntext=True,
+                                                             *args, **kwargs)
+            cl = self._newchangelog(text)
+            self.seenfiles.update(cl[3])
+            return node
+        else:
+            return super(changelog, self)._addrevision(*args, **kwargs)
+
     def branchinfo(self, rev):
         """return the branch name and open/close state of a revision
 
         This function exists because creating a changectx object