Patchwork [1,of,2] context: use normal manifest lookups instead of manifest.find

login
register
mail settings
Submitter Bryan O'Sullivan
Date March 12, 2013, 9:31 p.m.
Message ID <be03cde093c4ab03c92e.1363123891@australite.thefacebook.com>
Download mbox | patch
Permalink /patch/1116/
State Rejected, archived
Headers show

Comments

Bryan O'Sullivan - March 12, 2013, 9:31 p.m.
# HG changeset patch
# User Bryan O'Sullivan <bryano@fb.com>
# Date 1363123858 25200
# Node ID be03cde093c4ab03c92e31fc55d5680c35bd8327
# Parent  8cc8715b36030fc84f17426cc384155eb743a5f7
context: use normal manifest lookups instead of manifest.find

The manifest.find method was not managing the manifest's cache properly.
If that cache had not been primed prior to a lookup, and a lookup missed,
we were not adding the reconstructed entry to the cache.

This caused a huge slowdown in "hg grep", where we would reconstruct a
manifest entry from scratch for every single file in every revision we
were visiting.

In addition, manifest.find was using a Python binary search to scan the
text of the manifest. This is quite a lot slower than our C code for
parsing the entire manifest and simply looking an entry up. That code
made its debut in 2008, so manifest.find is simply bad news from every
perspective at this point.

For a manifest containing 150,000 entries with 6,000 changes in one
revision, this change improves "hg grep" performance from 650 seconds
to 2. It seems possible to me that other commands could be positively
affected, too, but I haven't really looked at which.
Durham Goode - March 12, 2013, 10:32 p.m.
On 3/12/13 2:31 PM, "Bryan O'Sullivan" <bos@serpentine.com> wrote:
>+        if ('_manifest' in self.__dict__ or not
>+            (('_manifestdelta' in self.__dict__ or path in self.files())
>and
>+             path in self._manifestdelta)):

This is a mouthful. Seemed easier to read before. Also, if '_manifest' is
not in self.__dict__, but the second condition succeeds, won't the
self._manifest call below throw?

>+            mf = self._manifest
>+        else:
>+            mf = self._manifestdelta
>+        try:
>+            return mf[path], mf.flags(path)
>+        except KeyError:
>             raise error.LookupError(self._node, path,
>                                     _('not found in manifest'))
>
Bryan O'Sullivan - March 12, 2013, 11:04 p.m.
On Tue, Mar 12, 2013 at 3:32 PM, Durham Goode <durham@fb.com> wrote:

> This is a mouthful. Seemed easier to read before.
>

It's nevertheless the same logic.


> Also, if '_manifest' is
> not in self.__dict__, but the second condition succeeds, won't the
> self._manifest call below throw?
>

No. Why not is an exercise for the reader :-)

Patch

diff --git a/mercurial/context.py b/mercurial/context.py
--- a/mercurial/context.py
+++ b/mercurial/context.py
@@ -287,23 +287,18 @@  class changectx(object):
         return troubles
 
     def _fileinfo(self, path):
-        if '_manifest' in self.__dict__:
-            try:
-                return self._manifest[path], self._manifest.flags(path)
-            except KeyError:
-                raise error.LookupError(self._node, path,
-                                        _('not found in manifest'))
-        if '_manifestdelta' in self.__dict__ or path in self.files():
-            if path in self._manifestdelta:
-                return (self._manifestdelta[path],
-                        self._manifestdelta.flags(path))
-        node, flag = self._repo.manifest.find(self._changeset[0], path)
-        if not node:
+        if ('_manifest' in self.__dict__ or not
+            (('_manifestdelta' in self.__dict__ or path in self.files()) and
+             path in self._manifestdelta)):
+            mf = self._manifest
+        else:
+            mf = self._manifestdelta
+        try:
+            return mf[path], mf.flags(path)
+        except KeyError:
             raise error.LookupError(self._node, path,
                                     _('not found in manifest'))
 
-        return node, flag
-
     def filenode(self, path):
         return self._fileinfo(path)[0]