Patchwork [3,of,3] manifest: postpone calling array.array() until needed

login
register
mail settings
Submitter Martin von Zweigbergk
Date Feb. 27, 2015, 12:02 a.m.
Message ID <1d3fe60a20d4f6284bc2.1424995369@martinvonz.mtv.corp.google.com>
Download mbox | patch
Permalink /patch/7841/
State Accepted
Headers show

Comments

Martin von Zweigbergk - Feb. 27, 2015, 12:02 a.m.
# HG changeset patch
# User Martin von Zweigbergk <martinvonz@google.com>
# Date 1424802971 28800
#      Tue Feb 24 10:36:11 2015 -0800
# Node ID 1d3fe60a20d4f6284bc295af27f9f272af30dcdd
# Parent  ef7dde10dc5941491a19dbd17ccb47a776f0dadd
manifest: postpone calling array.array() until needed

We currently convert a string to an array in manifest.read(), only to
store the result in the manifest cache. In most cases, the array will
not be used, so let's store the original string instead. Storage cost
should be similar assuming strings are stored in memory as byte
arrays. The only disadvantage should be that the conversion can be
done several times if multiple new revisions are created based on the
same cached revision.
Augie Fackler - Feb. 27, 2015, 8:37 p.m.
On Thu, Feb 26, 2015 at 04:02:49PM -0800, Martin von Zweigbergk wrote:
> # HG changeset patch
> # User Martin von Zweigbergk <martinvonz@google.com>
> # Date 1424802971 28800
> #      Tue Feb 24 10:36:11 2015 -0800
> # Node ID 1d3fe60a20d4f6284bc295af27f9f272af30dcdd
> # Parent  ef7dde10dc5941491a19dbd17ccb47a776f0dadd
> manifest: postpone calling array.array() until needed

These three look good, queued.

>
> We currently convert a string to an array in manifest.read(), only to
> store the result in the manifest cache. In most cases, the array will
> not be used, so let's store the original string instead. Storage cost
> should be similar assuming strings are stored in memory as byte
> arrays. The only disadvantage should be that the conversion can be
> done several times if multiple new revisions are created based on the
> same cached revision.
>
> diff -r ef7dde10dc59 -r 1d3fe60a20d4 mercurial/manifest.py
> --- a/mercurial/manifest.py	Tue Feb 24 09:08:54 2015 -0800
> +++ b/mercurial/manifest.py	Tue Feb 24 10:36:11 2015 -0800
> @@ -246,9 +246,8 @@
>          if node in self._mancache:
>              return self._mancache[node][0]
>          text = self.revision(node)
> -        arraytext = array.array('c', text)
>          m = _parse(text)
> -        self._mancache[node] = (m, arraytext)
> +        self._mancache[node] = (m, text)
>          return m
>
>      def find(self, node, f):
> @@ -280,7 +279,8 @@
>              # since the lists are already sorted
>              work.sort()
>
> -            arraytext, deltatext = m.fastdelta(self._mancache[p1][1], work)
> +            base = self._mancache[p1][1]
> +            arraytext, deltatext = m.fastdelta(array.array('c', base), work)
>              cachedelta = self.rev(p1), deltatext
>              text = util.buffer(arraytext)
>          else:
> @@ -289,10 +289,9 @@
>              # through to the revlog layer, and let it handle the delta
>              # process.
>              text = m.text()
> -            arraytext = array.array('c', text)
>              cachedelta = None
>
>          n = self.addrevision(text, transaction, link, p1, p2, cachedelta)
> -        self._mancache[n] = (m, arraytext)
> +        self._mancache[n] = (m, text)
>
>          return n
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
Matt Mackall - Feb. 27, 2015, 11:56 p.m.
On Thu, 2015-02-26 at 16:02 -0800, Martin von Zweigbergk wrote:
> # HG changeset patch
> # User Martin von Zweigbergk <martinvonz@google.com>
> # Date 1424802971 28800
> #      Tue Feb 24 10:36:11 2015 -0800
> # Node ID 1d3fe60a20d4f6284bc295af27f9f272af30dcdd
> # Parent  ef7dde10dc5941491a19dbd17ccb47a776f0dadd
> manifest: postpone calling array.array() until needed
> 
> We currently convert a string to an array in manifest.read(), only to
> store the result in the manifest cache. In most cases, the array will
> not be used, so let's store the original string instead. Storage cost
> should be similar assuming strings are stored in memory as byte
> arrays. The only disadvantage should be that the conversion can be
> done several times if multiple new revisions are created based on the
> same cached revision.

I've dropped this one because it looks like it de-optimized back-to-back
commits, for instance of the sort done by pushing a stack of mq patches.
Martin von Zweigbergk - Feb. 28, 2015, 4:56 a.m.
On Fri, Feb 27, 2015 at 3:57 PM Matt Mackall <mpm@selenic.com> wrote:

> On Thu, 2015-02-26 at 16:02 -0800, Martin von Zweigbergk wrote:
> > # HG changeset patch
> > # User Martin von Zweigbergk <martinvonz@google.com>
> > # Date 1424802971 28800
> > #      Tue Feb 24 10:36:11 2015 -0800
> > # Node ID 1d3fe60a20d4f6284bc295af27f9f272af30dcdd
> > # Parent  ef7dde10dc5941491a19dbd17ccb47a776f0dadd
> > manifest: postpone calling array.array() until needed
> >
> > We currently convert a string to an array in manifest.read(), only to
> > store the result in the manifest cache. In most cases, the array will
> > not be used, so let's store the original string instead. Storage cost
> > should be similar assuming strings are stored in memory as byte
> > arrays. The only disadvantage should be that the conversion can be
> > done several times if multiple new revisions are created based on the
> > same cached revision.
>
> I've dropped this one because it looks like it de-optimized back-to-back
> commits, for instance of the sort done by pushing a stack of mq patches.
>

I see the problem now. Sorry, I should have caught that.

It was a tangent to my other work, so I won't attempt a V2.

Patch

diff -r ef7dde10dc59 -r 1d3fe60a20d4 mercurial/manifest.py
--- a/mercurial/manifest.py	Tue Feb 24 09:08:54 2015 -0800
+++ b/mercurial/manifest.py	Tue Feb 24 10:36:11 2015 -0800
@@ -246,9 +246,8 @@ 
         if node in self._mancache:
             return self._mancache[node][0]
         text = self.revision(node)
-        arraytext = array.array('c', text)
         m = _parse(text)
-        self._mancache[node] = (m, arraytext)
+        self._mancache[node] = (m, text)
         return m
 
     def find(self, node, f):
@@ -280,7 +279,8 @@ 
             # since the lists are already sorted
             work.sort()
 
-            arraytext, deltatext = m.fastdelta(self._mancache[p1][1], work)
+            base = self._mancache[p1][1]
+            arraytext, deltatext = m.fastdelta(array.array('c', base), work)
             cachedelta = self.rev(p1), deltatext
             text = util.buffer(arraytext)
         else:
@@ -289,10 +289,9 @@ 
             # through to the revlog layer, and let it handle the delta
             # process.
             text = m.text()
-            arraytext = array.array('c', text)
             cachedelta = None
 
         n = self.addrevision(text, transaction, link, p1, p2, cachedelta)
-        self._mancache[n] = (m, arraytext)
+        self._mancache[n] = (m, text)
 
         return n