Patchwork [STABLE] cext: fix memory leak in phases computation

login
register
mail settings
Submitter Georges Racinet
Date June 6, 2021, 10:29 p.m.
Message ID <be560b55eb7cfe25c68f.1623018560@purity.tombe.racinet.fr>
Download mbox | patch
Permalink /patch/49156/
State New
Headers show

Comments

Georges Racinet - June 6, 2021, 10:29 p.m.
# HG changeset patch
# User Georges Racinet <georges.racinet@octobus.net>
# Date 1622935470 -7200
#      Sun Jun 06 01:24:30 2021 +0200
# Branch stable
# Node ID be560b55eb7cfe25c68fb6fab5417fab6688cf84
# Parent  5ac0f2a8ba7205266a206ad8da89a79173e8efea
# EXP-Topic memleak-phases
cext: fix memory leak in phases computation

Without this a buffer whose size in bytes is the number of
changesets in the repository is leaked each time the repository is
opened and changeset phases are computed.

Impact: the current code in hgwebdir creates a new `localrepository`
instance for each HTTP request. Since any pull or push is made of several
requests, a team of 100 people can easily produce thousands of such
requests per day.

Being a low-level malloc, this leak can't be seen with the gc module and
tools relying on that, but was spotted by valgrind immediately.

Reproduction
------------

  for i in range(cl_args.iterations):
      repo = hg.repository(baseui, repo_path)
      rev = repo.revs(rev).first()
      ctx = repo[rev]

  del ctx
  del repo
  # avoid any pollution by other type of leak
  # (that should be fixed in 5.8)
  repoview._filteredrepotypes.clear()

  gc.collect()

Measurements
------------

Resident Set Size (RSS), taken on a clone of
mozilla-central for performance analysis (440 000
changesets).

before:
  5.8+hg19.5ac0f2a8ba72  1000 iterations: 1606MB
  5.8+hg19.5ac0f2a8ba72 10000 iterations: 5723MB
after:
  5.8+hg20.e2084d39e145  1000 iterations:  555MB
  5.8+hg20.e2084d39e145 10000 iterations:  555MB
         (double checked, not a copy/paste error)

(e2084d39e14 is the present changeset, before amendment
of the message to add the measurements)
Georges Racinet - June 6, 2021, 10:35 p.m.
On 6/7/21 12:29 AM, Georges Racinet wrote:
> # HG changeset patch
> # User Georges Racinet <georges.racinet@octobus.net>
> # Date 1622935470 -7200
> #      Sun Jun 06 01:24:30 2021 +0200
> # Branch stable
> # Node ID be560b55eb7cfe25c68fb6fab5417fab6688cf84
> # Parent  5ac0f2a8ba7205266a206ad8da89a79173e8efea
> # EXP-Topic memleak-phases
> cext: fix memory leak in phases computation

Can also directly be pulled from
https://foss.heptapod.net/octobus/mercurial-devel

Corresponding CI run :
https://foss.heptapod.net/octobus/mercurial-devel/-/pipelines/22818

Thanks!
Joerg Sonnenberger - June 6, 2021, 11:10 p.m.
On Mon, Jun 07, 2021 at 12:29:20AM +0200, Georges Racinet wrote:
> # HG changeset patch
> # User Georges Racinet <georges.racinet@octobus.net>
> # Date 1622935470 -7200
> #      Sun Jun 06 01:24:30 2021 +0200
> # Branch stable
> # Node ID be560b55eb7cfe25c68fb6fab5417fab6688cf84
> # Parent  5ac0f2a8ba7205266a206ad8da89a79173e8efea
> # EXP-Topic memleak-phases
> cext: fix memory leak in phases computation

LGTM.

Joerg
Raphaël Gomès - June 7, 2021, 8:25 a.m.
Also looks correct to me. Since Joerg and the CI agree with this, I'll 
go ahead and queue this patch.

Thanks Georges

On 6/7/21 12:29 AM, Georges Racinet wrote:
> # HG changeset patch
> # User Georges Racinet <georges.racinet@octobus.net>
> # Date 1622935470 -7200
> #      Sun Jun 06 01:24:30 2021 +0200
> # Branch stable
> # Node ID be560b55eb7cfe25c68fb6fab5417fab6688cf84
> # Parent  5ac0f2a8ba7205266a206ad8da89a79173e8efea
> # EXP-Topic memleak-phases
> cext: fix memory leak in phases computation
>
> Without this a buffer whose size in bytes is the number of
> changesets in the repository is leaked each time the repository is
> opened and changeset phases are computed.
>
> Impact: the current code in hgwebdir creates a new `localrepository`
> instance for each HTTP request. Since any pull or push is made of several
> requests, a team of 100 people can easily produce thousands of such
> requests per day.
>
> Being a low-level malloc, this leak can't be seen with the gc module and
> tools relying on that, but was spotted by valgrind immediately.
>
> Reproduction
> ------------
>
>    for i in range(cl_args.iterations):
>        repo = hg.repository(baseui, repo_path)
>        rev = repo.revs(rev).first()
>        ctx = repo[rev]
>
>    del ctx
>    del repo
>    # avoid any pollution by other type of leak
>    # (that should be fixed in 5.8)
>    repoview._filteredrepotypes.clear()
>
>    gc.collect()
>
> Measurements
> ------------
>
> Resident Set Size (RSS), taken on a clone of
> mozilla-central for performance analysis (440 000
> changesets).
>
> before:
>    5.8+hg19.5ac0f2a8ba72  1000 iterations: 1606MB
>    5.8+hg19.5ac0f2a8ba72 10000 iterations: 5723MB
> after:
>    5.8+hg20.e2084d39e145  1000 iterations:  555MB
>    5.8+hg20.e2084d39e145 10000 iterations:  555MB
>           (double checked, not a copy/paste error)
>
> (e2084d39e14 is the present changeset, before amendment
> of the message to add the measurements)
>
> diff -r 5ac0f2a8ba72 -r be560b55eb7c mercurial/cext/revlog.c
> --- a/mercurial/cext/revlog.c	Thu May 20 14:20:39 2021 -0400
> +++ b/mercurial/cext/revlog.c	Sun Jun 06 01:24:30 2021 +0200
> @@ -919,6 +919,7 @@
>   		phasesets[i] = NULL;
>   	}
>   
> +	free(phases);
>   	return Py_BuildValue("nN", len, phasesetsdict);
>   
>   release:
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Patch

diff -r 5ac0f2a8ba72 -r be560b55eb7c mercurial/cext/revlog.c
--- a/mercurial/cext/revlog.c	Thu May 20 14:20:39 2021 -0400
+++ b/mercurial/cext/revlog.c	Sun Jun 06 01:24:30 2021 +0200
@@ -919,6 +919,7 @@ 
 		phasesets[i] = NULL;
 	}
 
+	free(phases);
 	return Py_BuildValue("nN", len, phasesetsdict);
 
 release: