Submitter | Bryan O'Sullivan |
---|---|
Date | March 11, 2013, 9:17 p.m. |
Message ID | <58ab71dcf9fbc1326267.1363036678@australite.thefacebook.com> |
Download | mbox | patch |
Permalink | /patch/1110/ |
State | Rejected |
Delegated to: | Augie Fackler |
Headers | show |
Comments
On Mar 11, 2013, at 4:17 PM, Bryan O'Sullivan <bos@serpentine.com> wrote: > # HG changeset patch > # User Bryan O'Sullivan <bryano@fb.com> > # Date 1363036671 25200 > # Node ID 58ab71dcf9fbc1326267811deb13074e595e9667 > # Parent 15e77967d1b3bb5eb9dfb6e7bfe94691d280af74 > manifest: cache parsed fulltext during find > > We were looking up the manifest cache during find, but if it had somehow > not been primed prior to a call, we were *not* adding the looked-up > entry to it. Ouch. Fix looks obviously correct, I'd say ship it. > > This led to a huge slowdown in "hg grep", where we would reconstruct a > manifest entry from scratch for every single file in every revision we > were visiting. > > For a manifest containing 150,000 entries with 6,000 changes in one > revision, this change improves "hg grep" performance from 650 seconds > to 2. It seems likely to me that other commands could be affected, too, > but I haven't really looked at which. > > diff --git a/mercurial/manifest.py b/mercurial/manifest.py > --- a/mercurial/manifest.py > +++ b/mercurial/manifest.py > @@ -107,6 +107,7 @@ class manifest(revlog.revlog): > mapping = self._mancache[node][0] > return mapping.get(f), mapping.flags(f) > text = self.revision(node) > + self._mancache[node] = self.parse(text), array.array('c', text) > start, end = self._search(text, f) > if start == end: > return None, None > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@selenic.com > http://selenic.com/mailman/listinfo/mercurial-devel
On 3/11/13 2:17 PM, "Bryan O'Sullivan" <bos@serpentine.com> wrote: > >diff --git a/mercurial/manifest.py b/mercurial/manifest.py >--- a/mercurial/manifest.py >+++ b/mercurial/manifest.py >@@ -107,6 +107,7 @@ class manifest(revlog.revlog): > mapping = self._mancache[node][0] > return mapping.get(f), mapping.flags(f) > text = self.revision(node) >+ self._mancache[node] = self.parse(text), array.array('c', text) > start, end = self._search(text, f) > if start == end: > return None, None From looking at the find() function, it looks like it was written with the purpose of not parsing the whole manifest. So I worry that this fix will have a negative affect somewhere else. Looks like the find() function was originally added as part of looking up tags (http://selenic.com/hg/rev/dbdce3b99988). 'hg tags' reads the manifest for each head and looks for the .hgtags file. So 'hg tags' will now be slower by (number of heads) * (speed of self.parse()). If that perf hit is acceptable, we might as well just write find as: def find(self, node, f): mapping = self.read(node) Return mapping.get(f), mapping.flags(f) If that perf hit is not acceptable, we should just change 'hg grep' to use manifest.read() and do the .get() and .flags() itself. Durham
On Mon, Mar 11, 2013 at 5:25 PM, Durham Goode <durham@fb.com> wrote: > > From looking at the find() function, it looks like it was written with the > purpose of not parsing the whole manifest. Yep. It was written back when manifest parsing was performed entirely in Python, and was consequently very slow. About two years later, I wrote the C code for parsing the manifest. At that point, the need to not parse the manifest was likely no longer an issue, except for those hypothetical people using Mercurial without a C compiler. So I worry that this fix will > have a negative affect somewhere else. I haven't been able to find such an effect. That said, I suspect that manifest.find and manifest._search are entirely unnecessary now (except for the --pure folks), and could be done away with. If that perf hit is acceptable, we might as well just write find as: > > def find(self, node, f): > mapping = self.read(node) > return mapping.get(f), mapping.flags(f) > I don't think there is a perf hit, and the above ought to work fine.
Yes, I'd drop find and _search altogether. Bryan O'Sullivan <bos@serpentine.com> wrote: On Mon, Mar 11, 2013 at 5:25 PM, Durham Goode <durham@fb.com<mailto:durham@fb.com>> wrote: From looking at the find() function, it looks like it was written with the purpose of not parsing the whole manifest. Yep. It was written back when manifest parsing was performed entirely in Python, and was consequently very slow. About two years later, I wrote the C code for parsing the manifest. At that point, the need to not parse the manifest was likely no longer an issue, except for those hypothetical people using Mercurial without a C compiler. So I worry that this fix will have a negative affect somewhere else. I haven't been able to find such an effect. That said, I suspect that manifest.find and manifest._search are entirely unnecessary now (except for the --pure folks), and could be done away with. If that perf hit is acceptable, we might as well just write find as: def find(self, node, f): mapping = self.read(node) return mapping.get(f), mapping.flags(f) I don't think there is a perf hit, and the above ought to work fine.
Patch
diff --git a/mercurial/manifest.py b/mercurial/manifest.py --- a/mercurial/manifest.py +++ b/mercurial/manifest.py @@ -107,6 +107,7 @@ class manifest(revlog.revlog): mapping = self._mancache[node][0] return mapping.get(f), mapping.flags(f) text = self.revision(node) + self._mancache[node] = self.parse(text), array.array('c', text) start, end = self._search(text, f) if start == end: return None, None