Submitter | Gregory Szorc |
---|---|
Date | Nov. 5, 2016, 4:41 p.m. |
Message ID | <1258a74819e5c2b4904d.1478364074@ubuntu-vm-main> |
Download | mbox | patch |
Permalink | /patch/17350/ |
State | Accepted |
Headers | show |
Comments
On Sat, 05 Nov 2016 09:41:14 -0700, Gregory Szorc wrote: > # HG changeset patch > # User Gregory Szorc <gregory.szorc@gmail.com> > # Date 1478363887 25200 > # Sat Nov 05 09:38:07 2016 -0700 > # Branch stable > # Node ID 1258a74819e5c2b4904d11c71231e3b3f33a0aa1 > # Parent e5cc44ea12de681d971fcbebb65a7fb71fd1c3c7 > hgweb: cache fctx.parents() in annotate command (issue5414) Queued this, thanks. > Looking at the code of basefilectx.parents(), there is room for > further improvements. Notably, we still perform redundant calls to > filelog.renamed() and basefilectx._parentfilectx(). And > basefilectx.annotate() also makes similar calls, so there is potential > for object reuse. However, introducing caches here are not appropriate > for the stable branch. Perhaps fctx.parents() can be property-cached, but we'll need to drop uninteresting chains of parents in fctx.annotate().
Excerpts from Yuya Nishihara's message of 2016-11-06 11:31:04 +0900: > Perhaps fctx.parents() can be property-cached, but we'll need to drop > uninteresting chains of parents in fctx.annotate(). If we go the property-cache approach, I think it's better to cache "_adjustedlinkrev". It's at a lower level and covers both "parents" and "introrev". Caching "parents" may increase memory usage unintentionally. I don't fully get what "uninteresting chains of parents" means here. In the annotate case, let's say f1, f2 = f0.parents(). Both f1 and f2 have _descendantrev set to f0's adjusted linkrev. Suppose there is a global cache dict: {(path, filenode, srcrev): linkrev}, I think if srcrev=_descendantrev (it's true for f1, f2) and _descendantrev is adjusted from the direct child (f0), then it is "interesting" and can be cached. This is similar to what marmoute said during the sprint - for the log -f or annotate case, once the first fctx's introrev is known, the cache can be used to calculate the ancestors' adjusted linkrevs.
On Sun, 6 Nov 2016 17:01:05 +0000, Jun Wu wrote: > Excerpts from Yuya Nishihara's message of 2016-11-06 11:31:04 +0900: > > Perhaps fctx.parents() can be property-cached, but we'll need to drop > > uninteresting chains of parents in fctx.annotate(). > > If we go the property-cache approach, I think it's better to cache > "_adjustedlinkrev". It's at a lower level and covers both "parents" > and "introrev". Caching "parents" may increase memory usage unintentionally. > > I don't fully get what "uninteresting chains of parents" means here. > In the annotate case, let's say f1, f2 = f0.parents(). > Both f1 and f2 have _descendantrev set to f0's adjusted linkrev. As you said, what's in my mind was the memory usage. Caching fctx.parents() would mean annotate() builds a full link from self to root nodes. Some of these intermediate nodes aren't useful for hgweb. > Suppose there is a global cache dict: {(path, filenode, srcrev): linkrev}, I > think if srcrev=_descendantrev (it's true for f1, f2) and _descendantrev is > adjusted from the direct child (f0), then it is "interesting" and can be > cached. This is similar to what marmoute said during the sprint - for the > log -f or annotate case, once the first fctx's introrev is known, the cache > can be used to calculate the ancestors' adjusted linkrevs. Given we have ugly hacks to pass ancestry data around fctx objects, a global cache might be useful.
On Mon, Nov 7, 2016 at 4:42 AM, Yuya Nishihara <yuya@tcha.org> wrote: > On Sun, 6 Nov 2016 17:01:05 +0000, Jun Wu wrote: > > Excerpts from Yuya Nishihara's message of 2016-11-06 11:31:04 +0900: > > > Perhaps fctx.parents() can be property-cached, but we'll need to drop > > > uninteresting chains of parents in fctx.annotate(). > > > > If we go the property-cache approach, I think it's better to cache > > "_adjustedlinkrev". It's at a lower level and covers both "parents" > > and "introrev". Caching "parents" may increase memory usage > unintentionally. > > > > I don't fully get what "uninteresting chains of parents" means here. > > In the annotate case, let's say f1, f2 = f0.parents(). > > Both f1 and f2 have _descendantrev set to f0's adjusted linkrev. > > As you said, what's in my mind was the memory usage. Caching fctx.parents() > would mean annotate() builds a full link from self to root nodes. Some of > these intermediate nodes aren't useful for hgweb. > > > Suppose there is a global cache dict: {(path, filenode, srcrev): > linkrev}, I > > think if srcrev=_descendantrev (it's true for f1, f2) and _descendantrev > is > > adjusted from the direct child (f0), then it is "interesting" and can be > > cached. This is similar to what marmoute said during the sprint - for the > > log -f or annotate case, once the first fctx's introrev is known, the > cache > > can be used to calculate the ancestors' adjusted linkrevs. > > Given we have ugly hacks to pass ancestry data around fctx objects, a > global > cache might be useful. > Could we change basefilectx.annotate() to return a rich data structure instead of a list of tuples? That data structure could have the cached parents and other reusable cache data (which could be passed into subsequent calls if needed).
Excerpts from Gregory Szorc's message of 2016-11-07 09:29:40 -0800: > Could we change basefilectx.annotate() to return a rich data structure > instead of a list of tuples? That data structure could have the cached > parents and other reusable cache data (which could be passed into > subsequent calls if needed). It's already returning "fctx", which is rich...
On Mon, 7 Nov 2016 17:32:40 +0000, Jun Wu wrote: > Excerpts from Gregory Szorc's message of 2016-11-07 09:29:40 -0800: > > Could we change basefilectx.annotate() to return a rich data structure > > instead of a list of tuples? That data structure could have the cached > > parents and other reusable cache data (which could be passed into > > subsequent calls if needed). > > It's already returning "fctx", which is rich... True. To be clear, I don't like the idea to make the API more complicated just because our rich data structure "fctx" has a poor performance.
Patch
diff --git a/mercurial/hgweb/webcommands.py b/mercurial/hgweb/webcommands.py --- a/mercurial/hgweb/webcommands.py +++ b/mercurial/hgweb/webcommands.py @@ -861,12 +861,24 @@ def annotate(web, req, tmpl): f = fctx.path() parity = paritygen(web.stripecount) + # parents() is called once per line and several lines likely belong to + # same revision. So it is worth caching. + # TODO there are still redundant operations within basefilectx.parents() + # and from the fctx.annotate() call itself that could be cached. + parentscache = {} def parents(f): - for p in f.parents(): - yield { - "node": p.hex(), - "rev": p.rev(), - } + rev = f.rev() + if rev not in parentscache: + parentscache[rev] = [] + for p in f.parents(): + entry = { + 'node': p.hex(), + 'rev': p.rev(), + } + parentscache[rev].append(entry) + + for p in parentscache[rev]: + yield p def annotate(**map): if util.binary(fctx.data()):