Patchwork [1,of,2,obs-speedup] obsolete: add C implementation of _addsuccessors

login
register
mail settings
Submitter Augie Fackler
Date Aug. 26, 2015, 2:09 p.m.
Message ID <CAPLqtWJ1c4_d3mqAVJJCHrTMUQRK0Y=VCs+tHCYe37f_xw9gpQ@mail.gmail.com>
Download mbox | patch
Permalink /patch/10293/
State Rejected
Headers show

Comments

Augie Fackler - Aug. 26, 2015, 2:09 p.m.
On Tue, Aug 25, 2015 at 4:52 PM, Matt Mackall <mpm@selenic.com> wrote:
> On Mon, 2015-08-24 at 14:25 -0400, Augie Fackler wrote:
>> # HG changeset patch
>> # User Augie Fackler <augie@google.com>
>> # Date 1440439282 14400
>> #      Mon Aug 24 14:01:22 2015 -0400
>> # Node ID 3aecfdbfe3ca54134cdb0e98944e8fca53bd46f1
>> # Parent  207e95720308518aa40c2a91ce20fdf52dfa8550
>> obsolete: add C implementation of _addsuccessors
>>
>> Before, best of 10 runs:
>> hg sl > /dev/null  0.75s user 0.07s system 99% cpu 0.828 total
>>
>> After, best of 10 runs:
>> ./hg sl > /dev/null  0.65s user 0.09s system 99% cpu 0.744 total
>>
>> where
>> alias.sl=log -Gr smart -Tsl
>> revsetalias.smart=(parents(not public()) or not public() or . or head()) and (not obsolete() or unstable()^)
>>
>> It's a straight-line port to C code, but still a 10% win on walltime
>> for the command. I can live with that.
>
> I'm kind of loathe to accept C code that isn't at least 4x or so faster
> than what it's replacing, but this 10% doesn't measure that directly.
> Can I get you to tweak perf.py to do that?

If my perf patch is right, this code didn't make much difference.

Before:
! wall 0.074395 comb 0.080000 user 0.070000 sys 0.010000 (best of 100)

After:
! wall 0.067937 comb 0.070000 user 0.070000 sys 0.000000 (best of 100)

So I guess we must be calling addsuccessors a lot of times or something.



is the perf patch, in case anything is obviously wrong there.

>
> --
> Mathematics is the supreme nostalgia of our time.
>
Augie Fackler - Aug. 26, 2015, 2:12 p.m.
On Wed, Aug 26, 2015 at 10:09 AM, Augie Fackler <raf@durin42.com> wrote:
>>
>> I'm kind of loathe to accept C code that isn't at least 4x or so faster
>> than what it's replacing, but this 10% doesn't measure that directly.
>> Can I get you to tweak perf.py to do that?
>
> If my perf patch is right, this code didn't make much difference.
>
> Before:
> ! wall 0.074395 comb 0.080000 user 0.070000 sys 0.010000 (best of 100)
>
> After:
> ! wall 0.067937 comb 0.070000 user 0.070000 sys 0.000000 (best of 100)
>
> So I guess we must be calling addsuccessors a lot of times or something.


Or I'm just crazy. We definitely only call the function once, and I no
longer see a significant win using hgperf. Sorry for the noise.
Matt Mackall - Aug. 27, 2015, 8:14 p.m.
On Wed, 2015-08-26 at 10:09 -0400, Augie Fackler wrote:
> On Tue, Aug 25, 2015 at 4:52 PM, Matt Mackall <mpm@selenic.com> wrote:
> > On Mon, 2015-08-24 at 14:25 -0400, Augie Fackler wrote:
> >> # HG changeset patch
> >> # User Augie Fackler <augie@google.com>
> >> # Date 1440439282 14400
> >> #      Mon Aug 24 14:01:22 2015 -0400
> >> # Node ID 3aecfdbfe3ca54134cdb0e98944e8fca53bd46f1
> >> # Parent  207e95720308518aa40c2a91ce20fdf52dfa8550
> >> obsolete: add C implementation of _addsuccessors
> >>
> >> Before, best of 10 runs:
> >> hg sl > /dev/null  0.75s user 0.07s system 99% cpu 0.828 total
> >>
> >> After, best of 10 runs:
> >> ./hg sl > /dev/null  0.65s user 0.09s system 99% cpu 0.744 total
> >>
> >> where
> >> alias.sl=log -Gr smart -Tsl
> >> revsetalias.smart=(parents(not public()) or not public() or . or head()) and (not obsolete() or unstable()^)
> >>
> >> It's a straight-line port to C code, but still a 10% win on walltime
> >> for the command. I can live with that.
> >
> > I'm kind of loathe to accept C code that isn't at least 4x or so faster
> > than what it's replacing, but this 10% doesn't measure that directly.
> > Can I get you to tweak perf.py to do that?
> 
> If my perf patch is right, this code didn't make much difference.

Huh. That's a bit unexpected, but I can't spot a problem.

Patch

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -576,3 +576,10 @@  def perfloadmarkers(ui, repo):
     timer, fm = gettimer(ui)
     timer(lambda: len(obsolete.obsstore(repo.svfs)))
     fm.end()
+
+@command('perfaddsuccessors')
+def perfaddsuccessors(ui, repo):
+    timer, fm = gettimer(ui)
+    o = repo.obsstore
+    timer(lambda : obsolete._addsuccessors({}, o._all) or len(o))
+    fm.end()