Patchwork dirstate: disable gc while parsing the dirstate

login
register
mail settings
Submitter Siddharth Agarwal
Date Feb. 10, 2013, 3:24 p.m.
Message ID <8c50a5734bb2b8833eb4.1360509840@sid0x220>
Download mbox | patch
Permalink /patch/936/
State Superseded
Commit 0969980308c7f728811b956018787c9f1b01ec69
Headers show

Comments

Siddharth Agarwal - Feb. 10, 2013, 3:24 p.m.
# HG changeset patch
# User Siddharth Agarwal <sid0@fb.com>
# Date 1360509284 0
# Node ID 8c50a5734bb2b8833eb425db95ce26073322d31d
# Parent  6fc7952d6819acbc212cdde688edc256c9bd4b2e
dirstate: disable gc while parsing the dirstate
Brodie Rao - Feb. 10, 2013, 3:28 p.m.
On Sun, Feb 10, 2013 at 3:24 PM, Siddharth Agarwal <sid0@fb.com> wrote:
> # HG changeset patch
> # User Siddharth Agarwal <sid0@fb.com>
> # Date 1360509284 0
> # Node ID 8c50a5734bb2b8833eb425db95ce26073322d31d
> # Parent  6fc7952d6819acbc212cdde688edc256c9bd4b2e
> dirstate: disable gc while parsing the dirstate
>
> diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
> --- a/mercurial/dirstate.py
> +++ b/mercurial/dirstate.py
> @@ -9,7 +9,7 @@ import errno
>  from node import nullid
>  from i18n import _
>  import scmutil, util, ignore, osutil, parsers, encoding
> -import os, stat, errno
> +import os, stat, errno, gc
>
>  propertycache = util.propertycache
>  filecache = scmutil.filecache
> @@ -285,7 +285,19 @@ class dirstate(object):
>          if not st:
>              return
>
> +        # Python's garbage collector triggers a GC each time a certain number
> +        # of container objects (the number being defined by
> +        # gc.get_threshold()) are allocated. parse_dirstate creates a tuple
> +        # for each file in the dirstate. The C version then immediately marks
> +        # them as not to be tracked by the collector. However, this has no
> +        # effect on when GCs are triggered, only on what objects the GC looks
> +        # into. This means that O(number of files) GCs are unavoidable.
> +        # Depending on when in the process's lifetime the dirstate is parsed,
> +        # this can get very expensive. As a workaround, disable GC while
> +        # parsing the dirstate.
> +        gc.disable()
>          p = parsers.parse_dirstate(self._map, self._copymap, st)
> +        gc.enable()
>          if not self._dirtypl:
>              self._pl = p

Can you provide some before and after performance numbers?

>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
Durham Goode - Feb. 10, 2013, 3:43 p.m.
>+        gc.disable()
>         p = parsers.parse_dirstate(self._map, self._copymap, st)
>+        gc.enable()
>         if not self._dirtypl:
>             self._pl = p

Probably needs to be in a try/finally?
Matt Mackall - Feb. 10, 2013, 3:47 p.m.
On Sun, 2013-02-10 at 15:43 +0000, Durham Goode wrote:
> >+        gc.disable()
> >         p = parsers.parse_dirstate(self._map, self._copymap, st)
> >+        gc.enable()
> >         if not self._dirtypl:
> >             self._pl = p
> 
> Probably needs to be in a try/finally?

Yes, you're correct.
Siddharth Agarwal - Feb. 10, 2013, 3:53 p.m.
On 02/10/2013 03:28 PM, Brodie Rao wrote:
> Can you provide some before and after performance numbers? 

It fixes a perf regression a patch I'm going to bomb shortly introduces 
(because it indirectly delays parsing the manifest a bit). For a repo 
with 150,000 files, it speeds up a no-op update from 2.13 seconds to 1.98.
Siddharth Agarwal - Feb. 10, 2013, 4:06 p.m.
On 02/10/2013 03:53 PM, Siddharth Agarwal wrote:
> On 02/10/2013 03:28 PM, Brodie Rao wrote:
>> Can you provide some before and after performance numbers? 
>
> It fixes a perf regression a patch I'm going to bomb shortly 
> introduces (because it indirectly delays parsing the manifest a bit).

Sorry, I meant parsing the *dirstate*.

Patch

diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
--- a/mercurial/dirstate.py
+++ b/mercurial/dirstate.py
@@ -9,7 +9,7 @@  import errno
 from node import nullid
 from i18n import _
 import scmutil, util, ignore, osutil, parsers, encoding
-import os, stat, errno
+import os, stat, errno, gc
 
 propertycache = util.propertycache
 filecache = scmutil.filecache
@@ -285,7 +285,19 @@  class dirstate(object):
         if not st:
             return
 
+        # Python's garbage collector triggers a GC each time a certain number
+        # of container objects (the number being defined by
+        # gc.get_threshold()) are allocated. parse_dirstate creates a tuple
+        # for each file in the dirstate. The C version then immediately marks
+        # them as not to be tracked by the collector. However, this has no
+        # effect on when GCs are triggered, only on what objects the GC looks
+        # into. This means that O(number of files) GCs are unavoidable.
+        # Depending on when in the process's lifetime the dirstate is parsed,
+        # this can get very expensive. As a workaround, disable GC while
+        # parsing the dirstate.
+        gc.disable()
         p = parsers.parse_dirstate(self._map, self._copymap, st)
+        gc.enable()
         if not self._dirtypl:
             self._pl = p