Submitter | Martin von Zweigbergk |
---|---|
Date | Feb. 27, 2015, 12:02 a.m. |
Message ID | <1d3fe60a20d4f6284bc2.1424995369@martinvonz.mtv.corp.google.com> |
Download | mbox | patch |
Permalink | /patch/7841/ |
State | Accepted |
Headers | show |
Comments
On Thu, Feb 26, 2015 at 04:02:49PM -0800, Martin von Zweigbergk wrote: > # HG changeset patch > # User Martin von Zweigbergk <martinvonz@google.com> > # Date 1424802971 28800 > # Tue Feb 24 10:36:11 2015 -0800 > # Node ID 1d3fe60a20d4f6284bc295af27f9f272af30dcdd > # Parent ef7dde10dc5941491a19dbd17ccb47a776f0dadd > manifest: postpone calling array.array() until needed These three look good, queued. > > We currently convert a string to an array in manifest.read(), only to > store the result in the manifest cache. In most cases, the array will > not be used, so let's store the original string instead. Storage cost > should be similar assuming strings are stored in memory as byte > arrays. The only disadvantage should be that the conversion can be > done several times if multiple new revisions are created based on the > same cached revision. > > diff -r ef7dde10dc59 -r 1d3fe60a20d4 mercurial/manifest.py > --- a/mercurial/manifest.py Tue Feb 24 09:08:54 2015 -0800 > +++ b/mercurial/manifest.py Tue Feb 24 10:36:11 2015 -0800 > @@ -246,9 +246,8 @@ > if node in self._mancache: > return self._mancache[node][0] > text = self.revision(node) > - arraytext = array.array('c', text) > m = _parse(text) > - self._mancache[node] = (m, arraytext) > + self._mancache[node] = (m, text) > return m > > def find(self, node, f): > @@ -280,7 +279,8 @@ > # since the lists are already sorted > work.sort() > > - arraytext, deltatext = m.fastdelta(self._mancache[p1][1], work) > + base = self._mancache[p1][1] > + arraytext, deltatext = m.fastdelta(array.array('c', base), work) > cachedelta = self.rev(p1), deltatext > text = util.buffer(arraytext) > else: > @@ -289,10 +289,9 @@ > # through to the revlog layer, and let it handle the delta > # process. > text = m.text() > - arraytext = array.array('c', text) > cachedelta = None > > n = self.addrevision(text, transaction, link, p1, p2, cachedelta) > - self._mancache[n] = (m, arraytext) > + self._mancache[n] = (m, text) > > return n > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@selenic.com > http://selenic.com/mailman/listinfo/mercurial-devel
On Thu, 2015-02-26 at 16:02 -0800, Martin von Zweigbergk wrote: > # HG changeset patch > # User Martin von Zweigbergk <martinvonz@google.com> > # Date 1424802971 28800 > # Tue Feb 24 10:36:11 2015 -0800 > # Node ID 1d3fe60a20d4f6284bc295af27f9f272af30dcdd > # Parent ef7dde10dc5941491a19dbd17ccb47a776f0dadd > manifest: postpone calling array.array() until needed > > We currently convert a string to an array in manifest.read(), only to > store the result in the manifest cache. In most cases, the array will > not be used, so let's store the original string instead. Storage cost > should be similar assuming strings are stored in memory as byte > arrays. The only disadvantage should be that the conversion can be > done several times if multiple new revisions are created based on the > same cached revision. I've dropped this one because it looks like it de-optimized back-to-back commits, for instance of the sort done by pushing a stack of mq patches.
On Fri, Feb 27, 2015 at 3:57 PM Matt Mackall <mpm@selenic.com> wrote: > On Thu, 2015-02-26 at 16:02 -0800, Martin von Zweigbergk wrote: > > # HG changeset patch > > # User Martin von Zweigbergk <martinvonz@google.com> > > # Date 1424802971 28800 > > # Tue Feb 24 10:36:11 2015 -0800 > > # Node ID 1d3fe60a20d4f6284bc295af27f9f272af30dcdd > > # Parent ef7dde10dc5941491a19dbd17ccb47a776f0dadd > > manifest: postpone calling array.array() until needed > > > > We currently convert a string to an array in manifest.read(), only to > > store the result in the manifest cache. In most cases, the array will > > not be used, so let's store the original string instead. Storage cost > > should be similar assuming strings are stored in memory as byte > > arrays. The only disadvantage should be that the conversion can be > > done several times if multiple new revisions are created based on the > > same cached revision. > > I've dropped this one because it looks like it de-optimized back-to-back > commits, for instance of the sort done by pushing a stack of mq patches. > I see the problem now. Sorry, I should have caught that. It was a tangent to my other work, so I won't attempt a V2.
Patch
diff -r ef7dde10dc59 -r 1d3fe60a20d4 mercurial/manifest.py --- a/mercurial/manifest.py Tue Feb 24 09:08:54 2015 -0800 +++ b/mercurial/manifest.py Tue Feb 24 10:36:11 2015 -0800 @@ -246,9 +246,8 @@ if node in self._mancache: return self._mancache[node][0] text = self.revision(node) - arraytext = array.array('c', text) m = _parse(text) - self._mancache[node] = (m, arraytext) + self._mancache[node] = (m, text) return m def find(self, node, f): @@ -280,7 +279,8 @@ # since the lists are already sorted work.sort() - arraytext, deltatext = m.fastdelta(self._mancache[p1][1], work) + base = self._mancache[p1][1] + arraytext, deltatext = m.fastdelta(array.array('c', base), work) cachedelta = self.rev(p1), deltatext text = util.buffer(arraytext) else: @@ -289,10 +289,9 @@ # through to the revlog layer, and let it handle the delta # process. text = m.text() - arraytext = array.array('c', text) cachedelta = None n = self.addrevision(text, transaction, link, p1, p2, cachedelta) - self._mancache[n] = (m, arraytext) + self._mancache[n] = (m, text) return n