Patchwork [STABLE] revset: fix O(2^n) perf regression in addset

login
register
mail settings
Submitter Durham Goode
Date Oct. 28, 2014, 9:20 p.m.
Message ID <247213e31a99056aa64b.1414531227@dev2000.prn2.facebook.com>
Download mbox | patch
Permalink /patch/6489/
State Accepted
Headers show

Comments

Durham Goode - Oct. 28, 2014, 9:20 p.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1414530366 25200
#      Tue Oct 28 14:06:06 2014 -0700
# Branch stable
# Node ID 247213e31a99056aa64b91d19592dcb72b436522
# Parent  6df4bc39f6828e8aa74b33e51bc72eac621db2fe
revset: fix O(2^n) perf regression in addset

hg log -r 1 ... -r 100 was never returning due to a regression in the way addset
computes __nonzero__.  It used 'bool(self._r1 or self._r2)' which required
executing self._r1.__nonzero__ twice (once for the or, once for the bool).  hg
log with a lot of -r's happens to build a one sided addset tree of N length,
which ends up being 2^N performance.

This patch fixes it by converting to bool before or'ing.

This problem can be repro'd with something as simple as:

hg log `for x in $(seq 1 50) ; do echo "-r $x "; done`

Adding '1 + 2 + ... + 20' to the revsetbenchmark.txt didn't seem to repro the
problem, so I wasn't able to add a revset benchmark for this issue.
Matt Mackall - Oct. 28, 2014, 9:32 p.m.
On Tue, 2014-10-28 at 14:20 -0700, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1414530366 25200
> #      Tue Oct 28 14:06:06 2014 -0700
> # Branch stable
> # Node ID 247213e31a99056aa64b91d19592dcb72b436522
> # Parent  6df4bc39f6828e8aa74b33e51bc72eac621db2fe
> revset: fix O(2^n) perf regression in addset

Queued for stable, thanks.
Pierre-Yves David - Nov. 1, 2014, 7:13 p.m.
On 10/28/2014 09:20 PM, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1414530366 25200
> #      Tue Oct 28 14:06:06 2014 -0700
> # Branch stable
> # Node ID 247213e31a99056aa64b91d19592dcb72b436522
> # Parent  6df4bc39f6828e8aa74b33e51bc72eac621db2fe
> revset: fix O(2^n) perf regression in addset
>
> hg log -r 1 ... -r 100 was never returning due to a regression in the way addset
> computes __nonzero__.  It used 'bool(self._r1 or self._r2)' which required
> executing self._r1.__nonzero__ twice (once for the or, once for the bool).  hg
> log with a lot of -r's happens to build a one sided addset tree of N length,
> which ends up being 2^N performance.
>
> This patch fixes it by converting to bool before or'ing.
>
> This problem can be repro'd with something as simple as:
>
> hg log `for x in $(seq 1 50) ; do echo "-r $x "; done`

Good catch, thanks. It seems like it would be valuable to cache the 
__nonzero__ return too. If you agree I'll send a patch.

> Adding '1 + 2 + ... + 20' to the revsetbenchmark.txt didn't seem to repro the
> problem, so I wasn't able to add a revset benchmark for this issue.

The revset benchmar is only doing iteration testing. I'm adding more 
versatility to the benchmark but this is not ready yet.

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -2500,7 +2500,7 @@  class addset(abstractsmartset):
         return len(self._list)
 
     def __nonzero__(self):
-        return bool(self._r1 or self._r2)
+        return bool(self._r1) or bool(self._r2)
 
     @util.propertycache
     def _list(self):