Patchwork [03,of,10,lazy-changelog-parse] changelog: lazily parse description

login
register
mail settings
Submitter Gregory Szorc
Date March 6, 2016, 11:58 p.m.
Message ID <d85951413907594c2cb3.1457308729@ubuntu-vm-main>
Download mbox | patch
Permalink /patch/13647/
State Accepted
Delegated to: Martin von Zweigbergk
Headers show

Comments

Gregory Szorc - March 6, 2016, 11:58 p.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1457303326 28800
#      Sun Mar 06 14:28:46 2016 -0800
# Node ID d85951413907594c2cb37744ce8b01de2b030930
# Parent  45c41cbfe73e7d685a8831cb73e0064eddc6d33e
changelog: lazily parse description

Before, the description field was converted to a localstr at parse
time. With this patch, we store the raw description and convert to
a localstr when it is first accessed.

We see a revset speedup for revsets that don't access the description:

author(mpm)
0.896565
0.914234
0.869085

date(2015)
0.878797
0.891980
0.862525

extra(rebase_source)
0.865446
0.912514
0.871500

author(mpm) or author(greg)
1.801832
1.860402
1.791589

date(2015) or branch(default)
0.968276
0.994673
0.974027

author(mpm) or desc(bug) or date(2015) or extra(rebase_source)
3.656193
3.721032
3.643593

As you can see, most of these revsets are already faster than from
before this refactoring: we have already offset the performance
loss from the introduction of the new class representing parsed
changelog entries!

Patch

diff --git a/mercurial/changelog.py b/mercurial/changelog.py
--- a/mercurial/changelog.py
+++ b/mercurial/changelog.py
@@ -147,17 +147,17 @@  class changelogrevision(object):
 
     Changelog revisions consist of multiple pieces of data, including
     the manifest node, user, and date. This object exposes a view into
     the parsed object.
     """
 
     __slots__ = (
         'date',
-        'description',
+        '_rawdesc',
         'extra',
         'files',
         'manifest',
         'user',
     )
 
     def __new__(cls, text):
         if not text:
@@ -180,19 +180,20 @@  class changelogrevision(object):
         # time tz extra\n : date (time is int or float, timezone is int)
         #                 : extra is metadata, encoded and separated by '\0'
         #                 : older versions ignore it
         # files\n\n       : files modified by the cset, no \n or \r allowed
         # (.*)            : comment (free text, ideally utf-8)
         #
         # changelog v0 doesn't use extra
 
-        last = text.index("\n\n")
-        self.description = encoding.tolocal(text[last + 2:])
-        l = text[:last].split('\n')
+        doublenl = text.index('\n\n')
+        self._rawdesc = text[doublenl + 2:]
+
+        l = text[:doublenl].split('\n')
         self.manifest = bin(l[0])
         self.user = encoding.tolocal(l[1])
 
         tdata = l[2].split(' ', 2)
         if len(tdata) != 3:
             time = float(tdata[0])
             try:
                 # various tools did silly things with the time zone field.
@@ -204,16 +205,20 @@  class changelogrevision(object):
             time, timezone = float(tdata[0]), int(tdata[1])
             self.extra = decodeextra(tdata[2])
 
         self.date = (time, timezone)
         self.files = l[3:]
 
         return self
 
+    @property
+    def description(self):
+        return encoding.tolocal(self._rawdesc)
+
 class changelog(revlog.revlog):
     def __init__(self, opener):
         revlog.revlog.__init__(self, opener, "00changelog.i")
         if self._initempty:
             # changelogs don't benefit from generaldelta
             self.version &= ~revlog.REVLOGGENERALDELTA
             self._generaldelta = False
         self._realopener = opener