Submitter | Pierre-Yves David |
---|---|
Date | Nov. 30, 2014, 1:57 a.m. |
Message ID | <d0f3dac4ea2b4aff5194.1417312662@marginatus.alto.octopoid.net> |
Download | mbox | patch |
Permalink | /patch/6890/ |
State | Superseded |
Headers | show |
Comments
On Sat Nov 29 2014 at 5:58:52 PM Pierre-Yves David < pierre-yves.david@ens-lyon.org> wrote: > # HG changeset patch > # User Pierre-Yves David <pierre-yves.david@fb.com> > # Date 1417049911 28800 > # Wed Nov 26 16:58:31 2014 -0800 > # Node ID d0f3dac4ea2b4aff51946c7db0834aa4e5c3e82a > # Parent 04eb7e49d2b6f90f71aa85de9ad0b4d70670d688 > obsstore: disable garbage collection during initialisation (issue4456) > > Python garbage collection is triggered by contained creation. So code that > creates a lot of tuple tends to trigger GC a lot. We disable the gc during > obsolescence marker parsing and associated initialization. On your repo, do you know the peak memory usage before and after? My gut feeling is that the amount of garbage that's now not collected until later is nothing to be concerned about, but it would be nice to confirm that. > The provide and > interesting speedup (25%). > > On my 58758 markers repo: > before: 0.468247 seconds > after: 0.344362 seconds > Do you see similar values from 'time hg ...'? Without trying to understand the patch, it sounds like you could just be moving the GC out of the method that's being timed. I just want to make sure it's an overall gain. > Thanks goes to Siddharth Agarwal for the lead. > > diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py > --- a/mercurial/obsolete.py > +++ b/mercurial/obsolete.py > @@ -66,10 +66,11 @@ The file starts with a version header: > The header is followed by the markers. Marker format depend of the > version. See > comment associated with each format for details. > > """ > import struct > +import gc > import util, base85, node > import phases > from i18n import _ > > _pack = struct.pack > @@ -466,12 +467,28 @@ class obsstore(object): > self.sopener = sopener > data = sopener.tryread('obsstore') > self._version = defaultformat > self._readonly = readonly > if data: > - self._version, markers = _readmarkers(data) > - self._load(markers) > + # Python's garbage collector triggers a GC each time a certain > + # number of container objects (the number being defined by > + # gc.get_threshold()) are allocated. Markers parsing creates > + # multiple tuples while parsing each markers so the gc is > triggered > + # a lot while parsing an high number of markers. As a > workaround, > + # disable GC during initialisation. > What was gc.get_threshold() in your case? Do you know how much memory was allocated? How many times GC was forced before this patch? > + # > + # This would probably marker parsing during exchange but I do > not > + # expect the order of magnitude to matter outside of > initialisation > + # case. > + gcenabled = gc.isenabled() > + gc.disable() > + try: > + self._version, markers = _readmarkers(data) > + self._load(markers) > + finally: > + if gcenabled: > + gc.enable() > > def __iter__(self): > return iter(self._all) > > def __len__(self): > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@selenic.com > http://selenic.com/mailman/listinfo/mercurial-devel >
On 11/29/2014 10:29 PM, Martin von Zweigbergk wrote: > > > On Sat Nov 29 2014 at 5:58:52 PM Pierre-Yves David > <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>> > wrote: > > # HG changeset patch > # User Pierre-Yves David <pierre-yves.david@fb.com > <mailto:pierre-yves.david@fb.com>> > # Date 1417049911 28800 > # Wed Nov 26 16:58:31 2014 -0800 > # Node ID d0f3dac4ea2b4aff51946c7db0834a__a4e5c3e82a > # Parent 04eb7e49d2b6f90f71aa85de9ad0b4__d70670d688 > obsstore: disable garbage collection during initialisation (issue4456) > > Python garbage collection is triggered by contained creation. So > code that > creates a lot of tuple tends to trigger GC a lot. We disable the gc > during > obsolescence marker parsing and associated initialization. > > > On your repo, do you know the peak memory usage before and after? My gut > feeling is that the amount of garbage that's now not collected until > later is nothing to be concerned about, but it would be nice to confirm > that. no relevant peak detected. > > The provide and > interesting speedup (25%). > > On my 58758 markers repo: > before: 0.468247 seconds > after: 0.344362 seconds > > > Do you see similar values from 'time hg ...'? Without trying to > understand the patch, it sounds like you could just be moving the GC out > of the method that's being timed. I just want to make sure it's an > overall gain. I do see similar gain, the overall time is much more unstable, but I get about the same benefit 0.05s > > Thanks goes to Siddharth Agarwal for the lead. > > diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py > --- a/mercurial/obsolete.py > +++ b/mercurial/obsolete.py > @@ -66,10 +66,11 @@ The file starts with a version header: > The header is followed by the markers. Marker format depend of the > version. See > comment associated with each format for details. > > """ > import struct > +import gc > import util, base85, node > import phases > from i18n import _ > > _pack = struct.pack > @@ -466,12 +467,28 @@ class obsstore(object): > self.sopener = sopener > data = sopener.tryread('obsstore') > self._version = defaultformat > self._readonly = readonly > if data: > - self._version, markers = _readmarkers(data) > - self._load(markers) > + # Python's garbage collector triggers a GC each time a > certain > + # number of container objects (the number being defined by > + # gc.get_threshold()) are allocated. Markers parsing > creates > + # multiple tuples while parsing each markers so the gc > is triggered > + # a lot while parsing an high number of markers. As a > workaround, > + # disable GC during initialisation. > > > What was gc.get_threshold() in your case? Do you know how much memory > was allocated? How many times GC was forced before this patch? Nope. I did not digged that far. The whole thing should be reimplemented in C anyway.
On Sat, 29 Nov 2014 17:57:42 -0800 Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote: > # HG changeset patch > # User Pierre-Yves David <pierre-yves.david@fb.com> > # Date 1417049911 28800 > # Wed Nov 26 16:58:31 2014 -0800 > # Node ID d0f3dac4ea2b4aff51946c7db0834aa4e5c3e82a > # Parent 04eb7e49d2b6f90f71aa85de9ad0b4d70670d688 > obsstore: disable garbage collection during initialisation (issue4456) > > Python garbage collection is triggered by contained creation. So code that > creates a lot of tuple tends to trigger GC a lot. We disable the gc during > obsolescence marker parsing and associated initialization. The provide and > interesting speedup (25%). > > On my 58758 markers repo: > before: 0.468247 seconds > after: 0.344362 seconds Which Python version is that? Regards Antoine.
On 11/30/2014 05:06 AM, Antoine Pitrou wrote: > On Sat, 29 Nov 2014 17:57:42 -0800 > Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote: >> # HG changeset patch >> # User Pierre-Yves David <pierre-yves.david@fb.com> >> # Date 1417049911 28800 >> # Wed Nov 26 16:58:31 2014 -0800 >> # Node ID d0f3dac4ea2b4aff51946c7db0834aa4e5c3e82a >> # Parent 04eb7e49d2b6f90f71aa85de9ad0b4d70670d688 >> obsstore: disable garbage collection during initialisation (issue4456) >> >> Python garbage collection is triggered by contained creation. So code that >> creates a lot of tuple tends to trigger GC a lot. We disable the gc during >> obsolescence marker parsing and associated initialization. The provide and >> interesting speedup (25%). >> >> On my 58758 markers repo: >> before: 0.468247 seconds >> after: 0.344362 seconds > > Which Python version is that? Python 2.7.8 (default, Oct 18 2014, 12:50:18) [GCC 4.9.1]
On Sat Nov 29 2014 at 11:47:09 PM Pierre-Yves David < pierre-yves.david@ens-lyon.org> wrote: > > > On 11/29/2014 10:29 PM, Martin von Zweigbergk wrote: > > > > The provide and > > interesting speedup (25%). > > > > On my 58758 markers repo: > > before: 0.468247 seconds > > after: 0.344362 seconds > > > > > > Do you see similar values from 'time hg ...'? Without trying to > > understand the patch, it sounds like you could just be moving the GC out > > of the method that's being timed. I just want to make sure it's an > > overall gain. > > I do see similar gain, the overall time is much more unstable, but I get > about the same benefit 0.05s > 0.05s is less than 0.12s and more like 10-15% than 25%, assuming you're still talking about the same baseline of ~0.46s. It's still a significant speedup, so it seems worthwhile, but you may want to include the "end-to-end" times in the description. Thanks for checking and for spending time on this. It's one of the most noticeable performance problems I see in day-to-day use.
On 11/29/2014 05:57 PM, Pierre-Yves David wrote: > # HG changeset patch > # User Pierre-Yves David <pierre-yves.david@fb.com> > # Date 1417049911 28800 > # Wed Nov 26 16:58:31 2014 -0800 > # Node ID d0f3dac4ea2b4aff51946c7db0834aa4e5c3e82a > # Parent 04eb7e49d2b6f90f71aa85de9ad0b4d70670d688 > obsstore: disable garbage collection during initialisation (issue4456) We also disable the GC while loading the dirstate, for the same reason. Can we factor the disable-GC logic out, into a decorator perhaps? (I did the same thing for hgsubversion recently for a ~3x perf win on a large repo, and I suspect we're going to have to do this more soon.) > > Python garbage collection is triggered by contained "container" > creation. So code that > creates a lot of tuple tends to trigger GC a lot. We disable the gc during > obsolescence marker parsing and associated initialization. The provide and "This provides an" > interesting speedup (25%). > > On my 58758 markers repo: > before: 0.468247 seconds > after: 0.344362 seconds > > Thanks goes to Siddharth Agarwal for the lead. > > diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py > --- a/mercurial/obsolete.py > +++ b/mercurial/obsolete.py > @@ -66,10 +66,11 @@ The file starts with a version header: > The header is followed by the markers. Marker format depend of the version. See > comment associated with each format for details. > > """ > import struct > +import gc > import util, base85, node > import phases > from i18n import _ > > _pack = struct.pack > @@ -466,12 +467,28 @@ class obsstore(object): > self.sopener = sopener > data = sopener.tryread('obsstore') > self._version = defaultformat > self._readonly = readonly > if data: > - self._version, markers = _readmarkers(data) > - self._load(markers) > + # Python's garbage collector triggers a GC each time a certain > + # number of container objects (the number being defined by > + # gc.get_threshold()) are allocated. Markers parsing creates > + # multiple tuples while parsing each markers so the gc is triggered > + # a lot while parsing an high number of markers. As a workaround, > + # disable GC during initialisation. > + # > + # This would probably marker parsing during exchange but I do not > + # expect the order of magnitude to matter outside of initialisation > + # case. I'm unable to parse this last sentence. > + gcenabled = gc.isenabled() > + gc.disable() > + try: > + self._version, markers = _readmarkers(data) > + self._load(markers) > + finally: > + if gcenabled: > + gc.enable() > > def __iter__(self): > return iter(self._all) > > def __len__(self): > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@selenic.com > http://selenic.com/mailman/listinfo/mercurial-devel
On 11/30/2014 10:27 PM, Martin von Zweigbergk wrote: > > > On Sat Nov 29 2014 at 11:47:09 PM Pierre-Yves David > <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>> > wrote: > > > > On 11/29/2014 10:29 PM, Martin von Zweigbergk wrote: > > > > The provide and > > interesting speedup (25%). > > > > On my 58758 markers repo: > > before: 0.468247 seconds > > after: 0.344362 seconds > > > > > > Do you see similar values from 'time hg ...'? Without trying to > > understand the patch, it sounds like you could just be moving the > GC out > > of the method that's being timed. I just want to make sure it's an > > overall gain. > > I do see similar gain, the overall time is much more unstable, but I get > about the same benefit 0.05s > > > 0.05s is less than 0.12s and more like 10-15% than 25%, assuming you're > still talking about the same baseline of ~0.46s. It's still a > significant speedup, so it seems worthwhile, but you may want to include > the "end-to-end" times in the description. I cannot read… I though the speedup was 8 so 5 is not too bad. I did some more time testing and I cannot see a time different form small operation like 'hg id'. So we should probably drop that patches from the series (the rest is okay)
On 12/01/2014 04:35 PM, Siddharth Agarwal wrote: > On 11/29/2014 05:57 PM, Pierre-Yves David wrote: >> # HG changeset patch >> # User Pierre-Yves David <pierre-yves.david@fb.com> >> # Date 1417049911 28800 >> # Wed Nov 26 16:58:31 2014 -0800 >> # Node ID d0f3dac4ea2b4aff51946c7db0834aa4e5c3e82a >> # Parent 04eb7e49d2b6f90f71aa85de9ad0b4d70670d688 >> obsstore: disable garbage collection during initialisation (issue4456) > > We also disable the GC while loading the dirstate, for the same reason. > Can we factor the disable-GC logic out, into a decorator perhaps? (I did > the same thing for hgsubversion recently for a ~3x perf win on a large > repo, and I suspect we're going to have to do this more soon.) So I've come to the conclusion that we do not actually save time by doing this (just moving the time somewhere else). But I this this decorator idea is great
On Sun, 2014-11-30 at 05:17 -0800, Pierre-Yves David wrote: > > On 11/30/2014 05:06 AM, Antoine Pitrou wrote: > > On Sat, 29 Nov 2014 17:57:42 -0800 > > Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote: > >> # HG changeset patch > >> # User Pierre-Yves David <pierre-yves.david@fb.com> > >> # Date 1417049911 28800 > >> # Wed Nov 26 16:58:31 2014 -0800 > >> # Node ID d0f3dac4ea2b4aff51946c7db0834aa4e5c3e82a > >> # Parent 04eb7e49d2b6f90f71aa85de9ad0b4d70670d688 > >> obsstore: disable garbage collection during initialisation (issue4456) > >> > >> Python garbage collection is triggered by contained creation. So code that > >> creates a lot of tuple tends to trigger GC a lot. We disable the gc during > >> obsolescence marker parsing and associated initialization. The provide and > >> interesting speedup (25%). > >> > >> On my 58758 markers repo: > >> before: 0.468247 seconds > >> after: 0.344362 seconds > > > > Which Python version is that? > > Python 2.7.8 (default, Oct 18 2014, 12:50:18) > [GCC 4.9.1] Python's GC behavior while building large containers is quite questionable (aka quadratic). We habitually knee-cap it. We'd have probably complained about it upstream years ago, but we discovered it around the time all the versions of Python Mercurial runs on were officially declared abandonware by PEP 404. http://www.selenic.com/hg/file/756376ec6c12/mercurial/dirstate.py#l310
On Wed, 03 Dec 2014 17:31:25 -0600 Matt Mackall <mpm@selenic.com> wrote: > On Sun, 2014-11-30 at 05:17 -0800, Pierre-Yves David wrote: > > > > On 11/30/2014 05:06 AM, Antoine Pitrou wrote: > > > On Sat, 29 Nov 2014 17:57:42 -0800 > > > Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote: > > >> # HG changeset patch > > >> # User Pierre-Yves David <pierre-yves.david@fb.com> > > >> # Date 1417049911 28800 > > >> # Wed Nov 26 16:58:31 2014 -0800 > > >> # Node ID d0f3dac4ea2b4aff51946c7db0834aa4e5c3e82a > > >> # Parent 04eb7e49d2b6f90f71aa85de9ad0b4d70670d688 > > >> obsstore: disable garbage collection during initialisation (issue4456) > > >> > > >> Python garbage collection is triggered by contained creation. So code that > > >> creates a lot of tuple tends to trigger GC a lot. We disable the gc during > > >> obsolescence marker parsing and associated initialization. The provide and > > >> interesting speedup (25%). > > >> > > >> On my 58758 markers repo: > > >> before: 0.468247 seconds > > >> after: 0.344362 seconds > > > > > > Which Python version is that? > > > > Python 2.7.8 (default, Oct 18 2014, 12:50:18) > > [GCC 4.9.1] > > Python's GC behavior while building large containers is quite > questionable (aka quadratic). It shouldn't, see https://hg.python.org/cpython/rev/79276316b94b/ If you have a reproducer, please open an issue at bugs.python.org. Regards Antoine.
On Thu, 2014-12-04 at 01:44 +0100, Antoine Pitrou wrote: > On Wed, 03 Dec 2014 17:31:25 -0600 > Matt Mackall <mpm@selenic.com> wrote: > > On Sun, 2014-11-30 at 05:17 -0800, Pierre-Yves David wrote: > > > > > > On 11/30/2014 05:06 AM, Antoine Pitrou wrote: > > > > On Sat, 29 Nov 2014 17:57:42 -0800 > > > > Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote: > > > >> # HG changeset patch > > > >> # User Pierre-Yves David <pierre-yves.david@fb.com> > > > >> # Date 1417049911 28800 > > > >> # Wed Nov 26 16:58:31 2014 -0800 > > > >> # Node ID d0f3dac4ea2b4aff51946c7db0834aa4e5c3e82a > > > >> # Parent 04eb7e49d2b6f90f71aa85de9ad0b4d70670d688 > > > >> obsstore: disable garbage collection during initialisation (issue4456) > > > >> > > > >> Python garbage collection is triggered by contained creation. So code that > > > >> creates a lot of tuple tends to trigger GC a lot. We disable the gc during > > > >> obsolescence marker parsing and associated initialization. The provide and > > > >> interesting speedup (25%). > > > >> > > > >> On my 58758 markers repo: > > > >> before: 0.468247 seconds > > > >> after: 0.344362 seconds > > > > > > > > Which Python version is that? > > > > > > Python 2.7.8 (default, Oct 18 2014, 12:50:18) > > > [GCC 4.9.1] > > > > Python's GC behavior while building large containers is quite > > questionable (aka quadratic). > > It shouldn't, see https://hg.python.org/cpython/rev/79276316b94b/ > > If you have a reproducer, please open an issue at bugs.python.org. You're right, the quadratic behavior appears to be gone in 2.7. In 2.7, it's down to O(n): generic construction of lists of tuples gets about 4x slower. So it's still definitely worth disabling for any potentially large list or dict we might build. And obsolete markers are in that category. And the quadratic behavior is still present in 2.6 (aka "the current Python" for people running enterprise distros) which we'll probably be supporting for a few years still.
On 12/03/2014 06:00 PM, Matt Mackall wrote: > On Thu, 2014-12-04 at 01:44 +0100, Antoine Pitrou wrote: >> On Wed, 03 Dec 2014 17:31:25 -0600 >> Matt Mackall <mpm@selenic.com> wrote: >>> On Sun, 2014-11-30 at 05:17 -0800, Pierre-Yves David wrote: >>>> >>>> On 11/30/2014 05:06 AM, Antoine Pitrou wrote: >>>>> On Sat, 29 Nov 2014 17:57:42 -0800 >>>>> Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote: >>>>>> # HG changeset patch >>>>>> # User Pierre-Yves David <pierre-yves.david@fb.com> >>>>>> # Date 1417049911 28800 >>>>>> # Wed Nov 26 16:58:31 2014 -0800 >>>>>> # Node ID d0f3dac4ea2b4aff51946c7db0834aa4e5c3e82a >>>>>> # Parent 04eb7e49d2b6f90f71aa85de9ad0b4d70670d688 >>>>>> obsstore: disable garbage collection during initialisation (issue4456) >>>>>> >>>>>> Python garbage collection is triggered by contained creation. So code that >>>>>> creates a lot of tuple tends to trigger GC a lot. We disable the gc during >>>>>> obsolescence marker parsing and associated initialization. The provide and >>>>>> interesting speedup (25%). >>>>>> >>>>>> On my 58758 markers repo: >>>>>> before: 0.468247 seconds >>>>>> after: 0.344362 seconds >>>>> >>>>> Which Python version is that? >>>> >>>> Python 2.7.8 (default, Oct 18 2014, 12:50:18) >>>> [GCC 4.9.1] >>> >>> Python's GC behavior while building large containers is quite >>> questionable (aka quadratic). >> >> It shouldn't, see https://hg.python.org/cpython/rev/79276316b94b/ >> >> If you have a reproducer, please open an issue at bugs.python.org. > > You're right, the quadratic behavior appears to be gone in 2.7. In 2.7, > it's down to O(n): generic construction of lists of tuples gets about 4x > slower. So it's still definitely worth disabling for any potentially > large list or dict we might build. And obsolete markers are in that > category. > > And the quadratic behavior is still present in 2.6 (aka "the current > Python" for people running enterprise distros) which we'll probably be > supporting for a few years still. Interesting, lets go for disabling GC during obsmarker loading then.
On Wed, 03 Dec 2014 20:00:23 -0600 Matt Mackall <mpm@selenic.com> wrote: > On Thu, 2014-12-04 at 01:44 +0100, Antoine Pitrou wrote: > > On Wed, 03 Dec 2014 17:31:25 -0600 > > Matt Mackall <mpm@selenic.com> wrote: > > > On Sun, 2014-11-30 at 05:17 -0800, Pierre-Yves David wrote: > > > > > > > > On 11/30/2014 05:06 AM, Antoine Pitrou wrote: > > > > > On Sat, 29 Nov 2014 17:57:42 -0800 > > > > > Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote: > > > > >> # HG changeset patch > > > > >> # User Pierre-Yves David <pierre-yves.david@fb.com> > > > > >> # Date 1417049911 28800 > > > > >> # Wed Nov 26 16:58:31 2014 -0800 > > > > >> # Node ID d0f3dac4ea2b4aff51946c7db0834aa4e5c3e82a > > > > >> # Parent 04eb7e49d2b6f90f71aa85de9ad0b4d70670d688 > > > > >> obsstore: disable garbage collection during initialisation (issue4456) > > > > >> > > > > >> Python garbage collection is triggered by contained creation. So code that > > > > >> creates a lot of tuple tends to trigger GC a lot. We disable the gc during > > > > >> obsolescence marker parsing and associated initialization. The provide and > > > > >> interesting speedup (25%). > > > > >> > > > > >> On my 58758 markers repo: > > > > >> before: 0.468247 seconds > > > > >> after: 0.344362 seconds > > > > > > > > > > Which Python version is that? > > > > > > > > Python 2.7.8 (default, Oct 18 2014, 12:50:18) > > > > [GCC 4.9.1] > > > > > > Python's GC behavior while building large containers is quite > > > questionable (aka quadratic). > > > > It shouldn't, see https://hg.python.org/cpython/rev/79276316b94b/ > > > > If you have a reproducer, please open an issue at bugs.python.org. > > You're right, the quadratic behavior appears to be gone in 2.7. In 2.7, > it's down to O(n): generic construction of lists of tuples gets about 4x > slower. So it's still definitely worth disabling for any potentially > large list or dict we might build. And obsolete markers are in that > category. Thanks. I'm wondering if there would be a way to improve our current heuristics to alleviate such issues. Regards Antoine.
On 12/01/2014 04:35 PM, Siddharth Agarwal wrote: > On 11/29/2014 05:57 PM, Pierre-Yves David wrote: >> # HG changeset patch >> # User Pierre-Yves David <pierre-yves.david@fb.com> >> # Date 1417049911 28800 >> # Wed Nov 26 16:58:31 2014 -0800 >> # Node ID d0f3dac4ea2b4aff51946c7db0834aa4e5c3e82a >> # Parent 04eb7e49d2b6f90f71aa85de9ad0b4d70670d688 >> obsstore: disable garbage collection during initialisation (issue4456) > > We also disable the GC while loading the dirstate, for the same reason. > Can we factor the disable-GC logic out, into a decorator perhaps? (I did > the same thing for hgsubversion recently for a ~3x perf win on a large > repo, and I suspect we're going to have to do this more soon.) > > >> >> Python garbage collection is triggered by contained > > "container" > >> creation. So code that >> creates a lot of tuple tends to trigger GC a lot. We disable the gc >> during >> obsolescence marker parsing and associated initialization. The provide >> and > > "This provides an" > >> interesting speedup (25%). >> >> On my 58758 markers repo: >> before: 0.468247 seconds >> after: 0.344362 seconds >> >> Thanks goes to Siddharth Agarwal for the lead. >> >> diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py >> --- a/mercurial/obsolete.py >> +++ b/mercurial/obsolete.py >> @@ -66,10 +66,11 @@ The file starts with a version header: >> The header is followed by the markers. Marker format depend of the >> version. See >> comment associated with each format for details. >> """ >> import struct >> +import gc >> import util, base85, node >> import phases >> from i18n import _ >> _pack = struct.pack >> @@ -466,12 +467,28 @@ class obsstore(object): >> self.sopener = sopener >> data = sopener.tryread('obsstore') >> self._version = defaultformat >> self._readonly = readonly >> if data: >> - self._version, markers = _readmarkers(data) >> - self._load(markers) >> + # Python's garbage collector triggers a GC each time a >> certain >> + # number of container objects (the number being defined by >> + # gc.get_threshold()) are allocated. Markers parsing creates >> + # multiple tuples while parsing each markers so the gc is >> triggered >> + # a lot while parsing an high number of markers. As a >> workaround, >> + # disable GC during initialisation. >> + # >> + # This would probably marker parsing during exchange but >> I do not >> + # expect the order of magnitude to matter outside of >> initialisation >> + # case. > > I'm unable to parse this last sentence. unmarmouted as: # This could probaly also speed up marker parsing during exchange # but I do not expect the order of magnitude to matter outside of # initialisation case. But I've dropped it from the V2 with decorator with I
On Thu, 2014-12-04 at 12:49 +0100, Antoine Pitrou wrote: > On Wed, 03 Dec 2014 20:00:23 -0600 > Matt Mackall <mpm@selenic.com> wrote: > > On Thu, 2014-12-04 at 01:44 +0100, Antoine Pitrou wrote: > > > On Wed, 03 Dec 2014 17:31:25 -0600 > > > Matt Mackall <mpm@selenic.com> wrote: > > > > On Sun, 2014-11-30 at 05:17 -0800, Pierre-Yves David wrote: > > > > > > > > > > On 11/30/2014 05:06 AM, Antoine Pitrou wrote: > > > > > > On Sat, 29 Nov 2014 17:57:42 -0800 > > > > > > Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote: > > > > > >> # HG changeset patch > > > > > >> # User Pierre-Yves David <pierre-yves.david@fb.com> > > > > > >> # Date 1417049911 28800 > > > > > >> # Wed Nov 26 16:58:31 2014 -0800 > > > > > >> # Node ID d0f3dac4ea2b4aff51946c7db0834aa4e5c3e82a > > > > > >> # Parent 04eb7e49d2b6f90f71aa85de9ad0b4d70670d688 > > > > > >> obsstore: disable garbage collection during initialisation (issue4456) > > > > > >> > > > > > >> Python garbage collection is triggered by contained creation. So code that > > > > > >> creates a lot of tuple tends to trigger GC a lot. We disable the gc during > > > > > >> obsolescence marker parsing and associated initialization. The provide and > > > > > >> interesting speedup (25%). > > > > > >> > > > > > >> On my 58758 markers repo: > > > > > >> before: 0.468247 seconds > > > > > >> after: 0.344362 seconds > > > > > > > > > > > > Which Python version is that? > > > > > > > > > > Python 2.7.8 (default, Oct 18 2014, 12:50:18) > > > > > [GCC 4.9.1] > > > > > > > > Python's GC behavior while building large containers is quite > > > > questionable (aka quadratic). > > > > > > It shouldn't, see https://hg.python.org/cpython/rev/79276316b94b/ > > > > > > If you have a reproducer, please open an issue at bugs.python.org. > > > > You're right, the quadratic behavior appears to be gone in 2.7. In 2.7, > > it's down to O(n): generic construction of lists of tuples gets about 4x > > slower. So it's still definitely worth disabling for any potentially > > large list or dict we might build. And obsolete markers are in that > > category. > > Thanks. I'm wondering if there would be a way to improve our current > heuristics to alleviate such issues. Maybe adapt the GC threshold proportional to the number of active objects seen at the last GC. "You've got a million objects already? I shouldn't make a fuss when you create another 100000 objects.."
On Fri, 05 Dec 2014 14:50:02 -0600 Matt Mackall <mpm@selenic.com> wrote: > > > You're right, the quadratic behavior appears to be gone in 2.7. In 2.7, > > > it's down to O(n): generic construction of lists of tuples gets about 4x > > > slower. So it's still definitely worth disabling for any potentially > > > large list or dict we might build. And obsolete markers are in that > > > category. > > > > Thanks. I'm wondering if there would be a way to improve our current > > heuristics to alleviate such issues. > > Maybe adapt the GC threshold proportional to the number of active > objects seen at the last GC. "You've got a million objects already? I > shouldn't make a fuss when you create another 100000 objects.." I'd still like to be sure there's a problem, with a simple reproducer. Let's see a microbenchmark. First, it's clear that building up a list of tuples /looks/ faster when the GC is disabled: $ python2.7 -m timeit -s "import gc" "gc.enable(); l = [(x, x) for x in range(1000000)]" 10 loops, best of 3: 137 msec per loop $ python2.7 -m timeit -s "import gc" "gc.disable(); l = [(x, x) for x in range(1000000)]" 10 loops, best of 3: 107 msec per loop But, when the next full collection comes, and if the container hasn't been deleted in the meantime, the full price is paid by the GC-disabling version: $ python2.7 -m timeit -s "import gc" "gc.enable(); l = [(x, x) for x in range(1000000)]; gc.collect()" 10 loops, best of 3: 155 msec per loop $ python2.7 -m timeit -s "import gc" "gc.disable(); l = [(x, x) for x in range(1000000)]; gc.collect()" 10 loops, best of 3: 160 msec per loop As you mentioned (or was it marmoute?), this may be due to the cost of walking and untracking the tuples; temporarily disabling the GC merely delays that operation, it doesn't eliminate it. Regards Antoine.
On Fri, 2014-12-05 at 22:34 +0100, Antoine Pitrou wrote: > On Fri, 05 Dec 2014 14:50:02 -0600 > Matt Mackall <mpm@selenic.com> wrote: > > > > You're right, the quadratic behavior appears to be gone in 2.7. In 2.7, > > > > it's down to O(n): generic construction of lists of tuples gets about 4x > > > > slower. So it's still definitely worth disabling for any potentially > > > > large list or dict we might build. And obsolete markers are in that > > > > category. > > > > > > Thanks. I'm wondering if there would be a way to improve our current > > > heuristics to alleviate such issues. > > > > Maybe adapt the GC threshold proportional to the number of active > > objects seen at the last GC. "You've got a million objects already? I > > shouldn't make a fuss when you create another 100000 objects.." > > I'd still like to be sure there's a problem, with a simple reproducer. Here's a test: import gc import time baseline = [] for x in xrange(10000): baseline.append((x, time.time())) gc.collect() b2 = [] begin = time.time() #gc.disable() for x in xrange(1000000): b2.append((x, [])) gc.enable() gc.collect() print time.time() - begin
Patch
diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py --- a/mercurial/obsolete.py +++ b/mercurial/obsolete.py @@ -66,10 +66,11 @@ The file starts with a version header: The header is followed by the markers. Marker format depend of the version. See comment associated with each format for details. """ import struct +import gc import util, base85, node import phases from i18n import _ _pack = struct.pack @@ -466,12 +467,28 @@ class obsstore(object): self.sopener = sopener data = sopener.tryread('obsstore') self._version = defaultformat self._readonly = readonly if data: - self._version, markers = _readmarkers(data) - self._load(markers) + # Python's garbage collector triggers a GC each time a certain + # number of container objects (the number being defined by + # gc.get_threshold()) are allocated. Markers parsing creates + # multiple tuples while parsing each markers so the gc is triggered + # a lot while parsing an high number of markers. As a workaround, + # disable GC during initialisation. + # + # This would probably marker parsing during exchange but I do not + # expect the order of magnitude to matter outside of initialisation + # case. + gcenabled = gc.isenabled() + gc.disable() + try: + self._version, markers = _readmarkers(data) + self._load(markers) + finally: + if gcenabled: + gc.enable() def __iter__(self): return iter(self._all) def __len__(self):