Patchwork [3,of,3] revlog: prevent recreating a tuple again and again for each rev

login
register
mail settings
Submitter Pulkit Goyal
Date Oct. 10, 2020, 8:47 a.m.
Message ID <a96cd3eebf8eba04a351.1602319644@workspace>
Download mbox | patch
Permalink /patch/47435/
State Accepted
Headers show

Comments

Pulkit Goyal - Oct. 10, 2020, 8:47 a.m.
# HG changeset patch
# User Pulkit Goyal <7895pulkit@gmail.com>
# Date 1602252446 -19800
#      Fri Oct 09 19:37:26 2020 +0530
# Node ID a96cd3eebf8eba04a351615728c9635e8941af10
# Parent  911cea33820c98da3fa2e4891153e674600df1af
# EXP-Topic sidedata-upgrade
revlog: prevent recreating a tuple again and again for each rev

I am tracking a memory leak during upgrade which does revlog cloning. Although
this does not make much of difference but seemed like a nice change to me.
Yuya Nishihara - Oct. 11, 2020, 2:32 a.m.
On Sat, 10 Oct 2020 14:17:24 +0530, Pulkit Goyal wrote:
> # HG changeset patch
> # User Pulkit Goyal <7895pulkit@gmail.com>
> # Date 1602252446 -19800
> #      Fri Oct 09 19:37:26 2020 +0530
> # Node ID a96cd3eebf8eba04a351615728c9635e8941af10
> # Parent  911cea33820c98da3fa2e4891153e674600df1af
> # EXP-Topic sidedata-upgrade
> revlog: prevent recreating a tuple again and again for each rev
> 
> I am tracking a memory leak during upgrade which does revlog cloning. Although
> this does not make much of difference but seemed like a nice change to me.
> 
> diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> --- a/mercurial/revlog.py
> +++ b/mercurial/revlog.py
> @@ -2773,6 +2773,7 @@ class revlog(object):
>          sidedatachanges = sidedatacompanion is not None
>          deltacomputer = deltautil.deltacomputer(destrevlog)
>          index = self.index
> +        defsidedataactions = (False, (), {})

I don't think this would matter for memory usage, but sidedataactions
can be assigned here if we really want to optimize it.

  sidedataactions = (False, [], {})  # constant if not sidedatachanges
  for rev in self:
      ...
      if sidedatachanges:
          sidedataactions = ...
Joerg Sonnenberger - Oct. 11, 2020, 7:51 p.m.
On Sun, Oct 11, 2020 at 11:32:34AM +0900, Yuya Nishihara wrote:
> On Sat, 10 Oct 2020 14:17:24 +0530, Pulkit Goyal wrote:
> > # HG changeset patch
> > # User Pulkit Goyal <7895pulkit@gmail.com>
> > # Date 1602252446 -19800
> > #      Fri Oct 09 19:37:26 2020 +0530
> > # Node ID a96cd3eebf8eba04a351615728c9635e8941af10
> > # Parent  911cea33820c98da3fa2e4891153e674600df1af
> > # EXP-Topic sidedata-upgrade
> > revlog: prevent recreating a tuple again and again for each rev
> > 
> > I am tracking a memory leak during upgrade which does revlog cloning. Although
> > this does not make much of difference but seemed like a nice change to me.
> > 
> > diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> > --- a/mercurial/revlog.py
> > +++ b/mercurial/revlog.py
> > @@ -2773,6 +2773,7 @@ class revlog(object):
> >          sidedatachanges = sidedatacompanion is not None
> >          deltacomputer = deltautil.deltacomputer(destrevlog)
> >          index = self.index
> > +        defsidedataactions = (False, (), {})
> 
> I don't think this would matter for memory usage, but sidedataactions
> can be assigned here if we really want to optimize it.

The difficult and dangerous part here is the dictionary as mutable
field. That prevents the use of a single constant object. I don't know
whether the action is modified and therefore if sharing is valid.

Joerg
Yuya Nishihara - Oct. 12, 2020, 10:13 a.m.
On Sun, 11 Oct 2020 21:51:37 +0200, Joerg Sonnenberger wrote:
> On Sun, Oct 11, 2020 at 11:32:34AM +0900, Yuya Nishihara wrote:
> > On Sat, 10 Oct 2020 14:17:24 +0530, Pulkit Goyal wrote:
> > > # HG changeset patch
> > > # User Pulkit Goyal <7895pulkit@gmail.com>
> > > # Date 1602252446 -19800
> > > #      Fri Oct 09 19:37:26 2020 +0530
> > > # Node ID a96cd3eebf8eba04a351615728c9635e8941af10
> > > # Parent  911cea33820c98da3fa2e4891153e674600df1af
> > > # EXP-Topic sidedata-upgrade
> > > revlog: prevent recreating a tuple again and again for each rev
> > > 
> > > I am tracking a memory leak during upgrade which does revlog cloning. Although
> > > this does not make much of difference but seemed like a nice change to me.
> > > 
> > > diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> > > --- a/mercurial/revlog.py
> > > +++ b/mercurial/revlog.py
> > > @@ -2773,6 +2773,7 @@ class revlog(object):
> > >          sidedatachanges = sidedatacompanion is not None
> > >          deltacomputer = deltautil.deltacomputer(destrevlog)
> > >          index = self.index
> > > +        defsidedataactions = (False, (), {})
> > 
> > I don't think this would matter for memory usage, but sidedataactions
> > can be assigned here if we really want to optimize it.
> 
> The difficult and dangerous part here is the dictionary as mutable
> field. That prevents the use of a single constant object.

Yep, creating defsidedataactions per loop is safer option if we don't
know the update dict isn't mutated. I just meant the defsidedataactions
reference can be eliminated.

Patch

diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -2773,6 +2773,7 @@  class revlog(object):
         sidedatachanges = sidedatacompanion is not None
         deltacomputer = deltautil.deltacomputer(destrevlog)
         index = self.index
+        defsidedataactions = (False, (), {})
         for rev in self:
             entry = index[rev]
 
@@ -2784,9 +2785,10 @@  class revlog(object):
             p2 = index[entry[6]][7]
             node = entry[7]
 
-            sidedataactions = (False, [], {})
             if sidedatachanges:
                 sidedataactions = sidedatacompanion(self, rev)
+            else:
+                sidedataactions = defsidedataactions
 
             # (Possibly) reuse the delta from the revlog if allowed and
             # the revlog chunk is a delta.