Patchwork [3,of,7] context.status: move manifest caching trick to _buildstatus()

login
register
mail settings
Submitter Martin von Zweigbergk
Date Nov. 2, 2014, 10:13 p.m.
Message ID <ada8ab4086be78146223.1414966427@handduk2.mtv.corp.google.com>
Download mbox | patch
Permalink /patch/6545/
State Accepted
Commit 39eb9f78f968d1ca9720ee215fae8bfb776587fe
Headers show

Comments

Martin von Zweigbergk - Nov. 2, 2014, 10:13 p.m.
# HG changeset patch
# User Martin von Zweigbergk <martinvonz@gmail.com>
# Date 1413097213 25200
#      Sun Oct 12 00:00:13 2014 -0700
# Branch stable
# Node ID ada8ab4086be7814622374c04aa5ab6c94deacb2
# Parent  3776f277e512ecdc5b2da73f3f3264af3e173640
context.status: move manifest caching trick to _buildstatus()

In basectx._buildstatus(), we read the manifests for the two revisions
being compared. For "caching reasons" unknown to me, it is better to
read the earlier manifest first, which basectx._prestatus() takes care
of. However, if the 'self' context is a committablectx and the 'other'
context is the parent of the working directory (as in the very common
case of plain "hg status"), there is no need to read any manifests at
all -- all that's needed is the dirstate status. To avoid reading the
manifests, _prestatus() is overridden in committablectx and avoids
calling its super method, and _buildstatus() calls its super method
only if the 'other' context is not the parent of the working
directory.

It seems easier to follow what's happening if we move the pre-fetching
to _buildstatus() just before the place where the manifests are
fetched. We just need to add an extra check that the revision is not
None to handle the case that was previously handled by subclass
overriding. That also makes it safe for committablectx._prestatus() to
call its parent, although the latter now becomes empty, so we won't
bother.
Sean Farley - Nov. 3, 2014, 6:39 p.m.
Martin von Zweigbergk writes:

> # HG changeset patch
> # User Martin von Zweigbergk <martinvonz@gmail.com>
> # Date 1413097213 25200
> #      Sun Oct 12 00:00:13 2014 -0700
> # Branch stable
> # Node ID ada8ab4086be7814622374c04aa5ab6c94deacb2
> # Parent  3776f277e512ecdc5b2da73f3f3264af3e173640
> context.status: move manifest caching trick to _buildstatus()
>
> In basectx._buildstatus(), we read the manifests for the two revisions
> being compared. For "caching reasons" unknown to me,

If I recall correctly, this "caching reason" was due to reading the
revlog from disk.
Martin von Zweigbergk - Nov. 3, 2014, 7:02 p.m.
On Mon Nov 03 2014 at 10:39:39 AM Sean Farley <sean.michael.farley@gmail.com>
wrote:

>
> Martin von Zweigbergk writes:
>
> > # HG changeset patch
> > # User Martin von Zweigbergk <martinvonz@gmail.com>
> > # Date 1413097213 25200
> > #      Sun Oct 12 00:00:13 2014 -0700
> > # Branch stable
> > # Node ID ada8ab4086be7814622374c04aa5ab6c94deacb2
> > # Parent  3776f277e512ecdc5b2da73f3f3264af3e173640
> > context.status: move manifest caching trick to _buildstatus()
> >
> > In basectx._buildstatus(), we read the manifests for the two revisions
> > being compared. For "caching reasons" unknown to me,
>
> If I recall correctly, this "caching reason" was due to reading the
> revlog from disk.
>

Yes, my guess was that it was something about caching while reading the
revlog, but thanks for confirming.
Matt Mackall - Nov. 10, 2014, 10:33 p.m.
On Mon, 2014-11-03 at 19:02 +0000, Martin von Zweigbergk wrote:
> On Mon Nov 03 2014 at 10:39:39 AM Sean Farley <sean.michael.farley@gmail.com>
> wrote:
> 
> >
> > Martin von Zweigbergk writes:
> >
> > > # HG changeset patch
> > > # User Martin von Zweigbergk <martinvonz@gmail.com>
> > > # Date 1413097213 25200
> > > #      Sun Oct 12 00:00:13 2014 -0700
> > > # Branch stable
> > > # Node ID ada8ab4086be7814622374c04aa5ab6c94deacb2
> > > # Parent  3776f277e512ecdc5b2da73f3f3264af3e173640
> > > context.status: move manifest caching trick to _buildstatus()
> > >
> > > In basectx._buildstatus(), we read the manifests for the two revisions
> > > being compared. For "caching reasons" unknown to me,
> >
> > If I recall correctly, this "caching reason" was due to reading the
> > revlog from disk.
> >
> 
> Yes, my guess was that it was something about caching while reading the
> revlog, but thanks for confirming.

More specifically, if you have revisions 1000 and 1001, 1001 is probably
stored as a delta against 1000. Thus, if you read 1000 first, we'll
reconstruct 1000 and cache it so that when you read 1001, we just need
to apply a delta to what's in the cache. So that's one full
reconstruction + one delta application.

If you go the other way, it's not possible to reverse-apply deltas
(because deletions are just stored as spans to delete), so you end up
doing two full reconstructions. 

(In the general case of walking a full file history backwards, this
becomes quadratic.)
Martin von Zweigbergk - Nov. 10, 2014, 10:55 p.m.
On Mon Nov 10 2014 at 2:33:57 PM Matt Mackall <mpm@selenic.com> wrote:

> On Mon, 2014-11-03 at 19:02 +0000, Martin von Zweigbergk wrote:
> > On Mon Nov 03 2014 at 10:39:39 AM Sean Farley <
> sean.michael.farley@gmail.com>
> > wrote:
> >
> > >
> > > Martin von Zweigbergk writes:
> > >
> > > > # HG changeset patch
> > > > # User Martin von Zweigbergk <martinvonz@gmail.com>
> > > > # Date 1413097213 25200
> > > > #      Sun Oct 12 00:00:13 2014 -0700
> > > > # Branch stable
> > > > # Node ID ada8ab4086be7814622374c04aa5ab6c94deacb2
> > > > # Parent  3776f277e512ecdc5b2da73f3f3264af3e173640
> > > > context.status: move manifest caching trick to _buildstatus()
> > > >
> > > > In basectx._buildstatus(), we read the manifests for the two
> revisions
> > > > being compared. For "caching reasons" unknown to me,
> > >
> > > If I recall correctly, this "caching reason" was due to reading the
> > > revlog from disk.
> > >
> >
> > Yes, my guess was that it was something about caching while reading the
> > revlog, but thanks for confirming.
>
> More specifically, if you have revisions 1000 and 1001, 1001 is probably
> stored as a delta against 1000. Thus, if you read 1000 first, we'll
> reconstruct 1000 and cache it so that when you read 1001, we just need
> to apply a delta to what's in the cache. So that's one full
> reconstruction + one delta application.
>

I did not think about the delta application overhead. Thanks for explaining.
Pierre-Yves David - Nov. 11, 2014, 12:23 a.m.
On 11/10/2014 10:55 PM, Martin von Zweigbergk wrote:
>
>
> On Mon Nov 10 2014 at 2:33:57 PM Matt Mackall <mpm@selenic.com
> <mailto:mpm@selenic.com>> wrote:
>
>     On Mon, 2014-11-03 at 19:02 +0000, Martin von Zweigbergk wrote:
>      > On Mon Nov 03 2014 at 10:39:39 AM Sean Farley
>     <sean.michael.farley@gmail.com <mailto:sean.michael.farley@gmail.com>__>
>      > wrote:
>      >
>      > >
>      > > Martin von Zweigbergk writes:
>      > >
>      > > > # HG changeset patch
>      > > > # User Martin von Zweigbergk <martinvonz@gmail.com
>     <mailto:martinvonz@gmail.com>>
>      > > > # Date 1413097213 25200
>      > > > #      Sun Oct 12 00:00:13 2014 -0700
>      > > > # Branch stable
>      > > > # Node ID ada8ab4086be7814622374c04aa5ab__6c94deacb2
>      > > > # Parent  3776f277e512ecdc5b2da73f3f3264__af3e173640
>      > > > context.status: move manifest caching trick to _buildstatus()
>      > > >
>      > > > In basectx._buildstatus(), we read the manifests for the two
>     revisions
>      > > > being compared. For "caching reasons" unknown to me,
>      > >
>      > > If I recall correctly, this "caching reason" was due to reading the
>      > > revlog from disk.
>      > >
>      >
>      > Yes, my guess was that it was something about caching while
>     reading the
>      > revlog, but thanks for confirming.
>
>     More specifically, if you have revisions 1000 and 1001, 1001 is probably
>     stored as a delta against 1000. Thus, if you read 1000 first, we'll
>     reconstruct 1000 and cache it so that when you read 1001, we just need
>     to apply a delta to what's in the cache. So that's one full
>     reconstruction + one delta application.
>
>
> I did not think about the delta application overhead. Thanks for explaining.


This should probably end up as a comment in a the code to share the 
wisdom with future reader.
Martin von Zweigbergk - Nov. 11, 2014, 12:38 a.m.
Sounds good to me. I think you're the one who picked up this series. Do you
want me to resend?

On Mon Nov 10 2014 at 4:24:04 PM Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 11/10/2014 10:55 PM, Martin von Zweigbergk wrote:
> >
> >
> > On Mon Nov 10 2014 at 2:33:57 PM Matt Mackall <mpm@selenic.com
> > <mailto:mpm@selenic.com>> wrote:
> >
> >     On Mon, 2014-11-03 at 19:02 +0000, Martin von Zweigbergk wrote:
> >      > On Mon Nov 03 2014 at 10:39:39 AM Sean Farley
> >     <sean.michael.farley@gmail.com <mailto:sean.michael.farley@gmail.com
> >__>
> >      > wrote:
> >      >
> >      > >
> >      > > Martin von Zweigbergk writes:
> >      > >
> >      > > > # HG changeset patch
> >      > > > # User Martin von Zweigbergk <martinvonz@gmail.com
> >     <mailto:martinvonz@gmail.com>>
> >      > > > # Date 1413097213 25200
> >      > > > #      Sun Oct 12 00:00:13 2014 -0700
> >      > > > # Branch stable
> >      > > > # Node ID ada8ab4086be7814622374c04aa5ab__6c94deacb2
> >      > > > # Parent  3776f277e512ecdc5b2da73f3f3264__af3e173640
> >      > > > context.status: move manifest caching trick to _buildstatus()
> >      > > >
> >      > > > In basectx._buildstatus(), we read the manifests for the two
> >     revisions
> >      > > > being compared. For "caching reasons" unknown to me,
> >      > >
> >      > > If I recall correctly, this "caching reason" was due to reading
> the
> >      > > revlog from disk.
> >      > >
> >      >
> >      > Yes, my guess was that it was something about caching while
> >     reading the
> >      > revlog, but thanks for confirming.
> >
> >     More specifically, if you have revisions 1000 and 1001, 1001 is
> probably
> >     stored as a delta against 1000. Thus, if you read 1000 first, we'll
> >     reconstruct 1000 and cache it so that when you read 1001, we just
> need
> >     to apply a delta to what's in the cache. So that's one full
> >     reconstruction + one delta application.
> >
> >
> > I did not think about the delta application overhead. Thanks for
> explaining.
>
>
> This should probably end up as a comment in a the code to share the
> wisdom with future reader.
>
> --
> Pierre-Yves David
>
Pierre-Yves David - Nov. 11, 2014, 12:44 a.m.
On 11/11/2014 12:38 AM, Martin von Zweigbergk wrote:
> Sounds good to me. I think you're the one who picked up this series. Do
> you want me to resend?

Did not follow the whole matt feedback yet. do you need to do some 
actual code change or is the comment the only changed need. If comment 
only, it can be a follow up.
Martin von Zweigbergk - Nov. 11, 2014, 12:49 a.m.
On Mon Nov 10 2014 at 4:45:00 PM Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 11/11/2014 12:38 AM, Martin von Zweigbergk wrote:
> > Sounds good to me. I think you're the one who picked up this series. Do
> > you want me to resend?
>
> Did not follow the whole matt feedback yet. do you need to do some
> actual code change or is the comment the only changed need. If comment
> only, it can be a follow up.
>

No real code change needed, but adding Matt's explanation where it
currently says "load earliest manifest first for caching reasons" would be
nice. But never mind, that's probably better done in a separate patch on
top of these other ones anyway. I'll try to remember to send such a patch
once this series is in.
Pierre-Yves David - Nov. 11, 2014, 10:15 a.m.
On 11/11/2014 12:49 AM, Martin von Zweigbergk wrote:
> On Mon Nov 10 2014 at 4:45:00 PM Pierre-Yves David
> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
> wrote:
>
>
>
>     On 11/11/2014 12:38 AM, Martin von Zweigbergk wrote:
>      > Sounds good to me. I think you're the one who picked up this
>     series. Do
>      > you want me to resend?
>
>     Did not follow the whole matt feedback yet. do you need to do some
>     actual code change or is the comment the only changed need. If comment
>     only, it can be a follow up.
>
>
> No real code change needed, but adding Matt's explanation where it
> currently says "load earliest manifest first for caching reasons" would
> be nice. But never mind, that's probably better done in a separate patch
> on top of these other ones anyway. I'll try to remember to send such a
> patch once this series is in.

Yes, it can go in a separated patch. If you send it now, I'll put it on 
top of the other.

Patch

diff --git a/mercurial/context.py b/mercurial/context.py
--- a/mercurial/context.py
+++ b/mercurial/context.py
@@ -99,9 +99,6 @@ 
         For example, this allows other contexts, such as workingctx, to query
         the dirstate before comparing the manifests.
         """
-        # load earliest manifest first for caching reasons
-        if self.rev() < other.rev():
-            self.manifest()
         return s
 
     def _poststatus(self, other, s, match, listignored, listclean, listunknown):
@@ -115,6 +112,9 @@ 
     def _buildstatus(self, other, s, match, listignored, listclean,
                      listunknown):
         """build a status with respect to another context"""
+        # load earliest manifest first for caching reasons
+        if self.rev() is not None and self.rev() < other.rev():
+            self.manifest()
         mf1 = other._manifestmatches(match, s)
         mf2 = self._manifestmatches(match, s)
 
@@ -1416,9 +1416,7 @@ 
         We use this prestatus hook to populate the status with information from
         the dirstate.
         """
-        # doesn't need to call super; if that changes, be aware that super
-        # calls self.manifest which would slow down the common case of calling
-        # status against a workingctx's parent
+        # doesn't need to call super
         return self._dirstatestatus(match, listignored, listclean, listunknown)
 
     def _poststatus(self, other, s, match, listignored, listclean, listunknown):