Submitter | Siddharth Agarwal |
---|---|
Date | Feb. 10, 2013, 3:24 p.m. |
Message ID | <8c50a5734bb2b8833eb4.1360509840@sid0x220> |
Download | mbox | patch |
Permalink | /patch/936/ |
State | Superseded |
Commit | 0969980308c7f728811b956018787c9f1b01ec69 |
Headers | show |
Comments
On Sun, Feb 10, 2013 at 3:24 PM, Siddharth Agarwal <sid0@fb.com> wrote: > # HG changeset patch > # User Siddharth Agarwal <sid0@fb.com> > # Date 1360509284 0 > # Node ID 8c50a5734bb2b8833eb425db95ce26073322d31d > # Parent 6fc7952d6819acbc212cdde688edc256c9bd4b2e > dirstate: disable gc while parsing the dirstate > > diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py > --- a/mercurial/dirstate.py > +++ b/mercurial/dirstate.py > @@ -9,7 +9,7 @@ import errno > from node import nullid > from i18n import _ > import scmutil, util, ignore, osutil, parsers, encoding > -import os, stat, errno > +import os, stat, errno, gc > > propertycache = util.propertycache > filecache = scmutil.filecache > @@ -285,7 +285,19 @@ class dirstate(object): > if not st: > return > > + # Python's garbage collector triggers a GC each time a certain number > + # of container objects (the number being defined by > + # gc.get_threshold()) are allocated. parse_dirstate creates a tuple > + # for each file in the dirstate. The C version then immediately marks > + # them as not to be tracked by the collector. However, this has no > + # effect on when GCs are triggered, only on what objects the GC looks > + # into. This means that O(number of files) GCs are unavoidable. > + # Depending on when in the process's lifetime the dirstate is parsed, > + # this can get very expensive. As a workaround, disable GC while > + # parsing the dirstate. > + gc.disable() > p = parsers.parse_dirstate(self._map, self._copymap, st) > + gc.enable() > if not self._dirtypl: > self._pl = p Can you provide some before and after performance numbers? > > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@selenic.com > http://selenic.com/mailman/listinfo/mercurial-devel
>+ gc.disable() > p = parsers.parse_dirstate(self._map, self._copymap, st) >+ gc.enable() > if not self._dirtypl: > self._pl = p Probably needs to be in a try/finally?
On Sun, 2013-02-10 at 15:43 +0000, Durham Goode wrote: > >+ gc.disable() > > p = parsers.parse_dirstate(self._map, self._copymap, st) > >+ gc.enable() > > if not self._dirtypl: > > self._pl = p > > Probably needs to be in a try/finally? Yes, you're correct.
On 02/10/2013 03:28 PM, Brodie Rao wrote:
> Can you provide some before and after performance numbers?
It fixes a perf regression a patch I'm going to bomb shortly introduces
(because it indirectly delays parsing the manifest a bit). For a repo
with 150,000 files, it speeds up a no-op update from 2.13 seconds to 1.98.
On 02/10/2013 03:53 PM, Siddharth Agarwal wrote: > On 02/10/2013 03:28 PM, Brodie Rao wrote: >> Can you provide some before and after performance numbers? > > It fixes a perf regression a patch I'm going to bomb shortly > introduces (because it indirectly delays parsing the manifest a bit). Sorry, I meant parsing the *dirstate*.
Patch
diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py --- a/mercurial/dirstate.py +++ b/mercurial/dirstate.py @@ -9,7 +9,7 @@ import errno from node import nullid from i18n import _ import scmutil, util, ignore, osutil, parsers, encoding -import os, stat, errno +import os, stat, errno, gc propertycache = util.propertycache filecache = scmutil.filecache @@ -285,7 +285,19 @@ class dirstate(object): if not st: return + # Python's garbage collector triggers a GC each time a certain number + # of container objects (the number being defined by + # gc.get_threshold()) are allocated. parse_dirstate creates a tuple + # for each file in the dirstate. The C version then immediately marks + # them as not to be tracked by the collector. However, this has no + # effect on when GCs are triggered, only on what objects the GC looks + # into. This means that O(number of files) GCs are unavoidable. + # Depending on when in the process's lifetime the dirstate is parsed, + # this can get very expensive. As a workaround, disable GC while + # parsing the dirstate. + gc.disable() p = parsers.parse_dirstate(self._map, self._copymap, st) + gc.enable() if not self._dirtypl: self._pl = p