Patchwork [2,of,6,v2] util: add an elapsed time wrapper

login
register
mail settings
Submitter Simon Farnsworth
Date Feb. 2, 2017, 7:18 p.m.
Message ID <12d0ac224bb34691d44a.1486063136@devvm022.lla2.facebook.com>
Download mbox | patch
Permalink /patch/18301/
State Accepted
Headers show

Comments

Simon Farnsworth - Feb. 2, 2017, 7:18 p.m.
# HG changeset patch
# User Simon Farnsworth <simonfar@fb.com>
# Date 1486063056 28800
#      Thu Feb 02 11:17:36 2017 -0800
# Node ID 12d0ac224bb34691d44a2cead5b9795a6cfc2490
# Parent  f443bd95a948aaed36e81edb884085fc2f0f5acf
util: add an elapsed time wrapper

We want to log the time spent in various commands. Add a wrapper class that
can be used to measure time taken by a function or all members of a class
Bryan O'Sullivan - Feb. 2, 2017, 7:32 p.m.
On Thu, Feb 2, 2017 at 11:18 AM, Simon Farnsworth <simonfar@fb.com> wrote:

> util: add an elapsed time wrapper
>

To be honest, this seems like a heavily over-engineered approach to me.

As it happens, I've got a less mature patch set in the same space, which
I'll put out as an RFC within the next few days, so we can compare and
contrast the two.
Bryan O'Sullivan - Feb. 3, 2017, 12:54 a.m.
On Thu, Feb 2, 2017 at 11:32 AM, Bryan O'Sullivan <bos@serpentine.com>
wrote:

>
> To be honest, this seems like a heavily over-engineered approach to me.
>

I don't like giving such vague feedback, but I was hopping off the shuttle
this morning and couldn't write anything more concrete at the time. Sorry
about that.

I also feel somewhat apologetic about seagulling in abruptly with a
critique after months and months of silence, but ... hi?

I wanted to substantiate my sense of unease with this patchset. Here's what
I came up with.

First, you'll want to take a look at my different approach, which I just
sent out a moment ago. (It's not cooked, and I forgot to add the "RFC" tag,
but you get the idea.)

Next, performance. Here's a little benchmark: http://pastebin.com/f9eFEWiq

On my laptop, base Mercurial runs it in 0.15 seconds.
With performance logging enabled, your patchset takes 0.36 seconds.
And my patchset takes 0.16, with no need to enable or disable anything.

To be clear, the performance numbers should not be persuasive, because a
typical invocation of Mercurial will call ui.write maybe 1% as often as the
benchmark does. But what they do is give me another angle from which to
look at "this doesn't feel right to me".

Your patch adds a bunch of allocation and indirection on what for this
benchmark is a fast path, which is why it appears so costly.

Abstractions in Python have a heavy runtime cost and of course the usual
"now I have an abstraction to understand" overhead, and so you have to be
particularly judicious in creating and using them.

The above approach of indirection via a wrapper object doesn't meet my
personal "abstraction that pays for the cost of understanding and using it"
bar, especially as the alternative approach in my patchset is both simple
to understand and has almost no overhead.

Also, the way that you're conditionally recording time used in both this
and later patches (for crecord and extdiff) strikes me as unnecessarily
complex. My patches demonstrate that doing so unconditionally is simpler
and has inconsequential overhead.
Simon Farnsworth - Feb. 3, 2017, 10:26 p.m.
On 03/02/2017 00:54, Bryan O'Sullivan wrote:
> On Thu, Feb 2, 2017 at 11:32 AM, Bryan O'Sullivan <bos@serpentine.com
> <mailto:bos@serpentine.com>> wrote:
>
>
>     To be honest, this seems like a heavily over-engineered approach to me.
>
>
> I don't like giving such vague feedback, but I was hopping off the
> shuttle this morning and couldn't write anything more concrete at the
> time. Sorry about that.
>
> I also feel somewhat apologetic about seagulling in abruptly with a
> critique after months and months of silence, but ... hi?
>
> I wanted to substantiate my sense of unease with this patchset. Here's
> what I came up with.
>
> First, you'll want to take a look at my different approach, which I just
> sent out a moment ago. (It's not cooked, and I forgot to add the "RFC"
> tag, but you get the idea.)
>
> Next, performance. Here's a little
> benchmark: http://pastebin.com/f9eFEWiq
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__pastebin.com_f9eFEWiq&d=DwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=mEgSWILcY4c4W3zjApBQLA&m=Z_ILapZ4MOSTKVH08OWF7ibTePT7tfFDPNqh5YnzmFQ&s=iN5BYr4m1lyQkPVl1UVCnw43-0a3kpVooUdu2lfAGzU&e=>
>
> On my laptop, base Mercurial runs it in 0.15 seconds.
> With performance logging enabled, your patchset takes 0.36 seconds.
> And my patchset takes 0.16, with no need to enable or disable anything.
>
> To be clear, the performance numbers should not be persuasive, because a
> typical invocation of Mercurial will call ui.write maybe 1% as often as
> the benchmark does. But what they do is give me another angle from which
> to look at "this doesn't feel right to me".
>
I'd assumed that with Mercurial encouraging extension authors to wrap 
functions, the overhead wasn't significant. Your patchset and message 
had me run a couple of experiments on my Mac, using the timeit module, 
and a trivial function:

def func():
	print "Hello"

This is both trivial, and on the sort of complexity order of the 
functions I'm trying to measure.

The baseline I got was that 1,000,000 invocations of this function take 
2.0 seconds elapsed time.

I then added if True to the function, to get a sense of the pain of 
making it conditional:

def func():
	if True:
		print "Hello"

This takes 2.3 seconds for the same million repetitions.

I then restored the unconditional version of func(); using a simple 
wrapper that just does if True: func(), I get 2.4 seconds, while a 
class-based unconditional wrapper is 2.7 seconds:

class wrapper:
	def __init__(self, func):
		self.func = func
	def __call__(self, *args, **kwargs):
		self.func(*args, **kwargs)

It looks to me like the conditional actually adds significant pain in 
its own right, almost as much as the complex class based wrapper does.

Assuming the community doesn't object strongly, I'll redo to be 
unconditionally measuring time (and only conditionally logging it), as 
the overhead of conditionality seems excessive for this purpose.

> Your patch adds a bunch of allocation and indirection on what for this
> benchmark is a fast path, which is why it appears so costly.
>
> Abstractions in Python have a heavy runtime cost and of course the usual
> "now I have an abstraction to understand" overhead, and so you have to
> be particularly judicious in creating and using them.
>
> The above approach of indirection via a wrapper object doesn't meet my
> personal "abstraction that pays for the cost of understanding and using
> it" bar, especially as the alternative approach in my patchset is both
> simple to understand and has almost no overhead.
>
> Also, the way that you're conditionally recording time used in both this
> and later patches (for crecord and extdiff) strikes me as unnecessarily
> complex. My patches demonstrate that doing so unconditionally is simpler
> and has inconsequential overhead.

I definitely want the reporting to be conditional - if you're not going 
to use the time, we shouldn't output it. My mental model for Python 
costs was wrong (and is now corrected), and I'll redo the time gathering 
accordingly.

 From my point of view, the goals are:

  1. Establish groups of "not Mercurial's fault" slowness - stdio, 
extdiff, editor invoked for histedit etc. This combines with FB infra to 
let us focus on cases where Mercurial code is the bottleneck.

  2. Do so in such a way that users not on FB infra aren't affected 
unless they choose to be, even if they're using things like logtoprocess 
today.

It looks like I met goal 2 at the expense of goal 1, and I'm going to 
need to rework to get a nicer compromise in place between the goals.
Bryan O'Sullivan - Feb. 3, 2017, 10:36 p.m.
On Fri, Feb 3, 2017 at 2:26 PM, Simon Farnsworth <simonfar@fb.com> wrote:

I'd assumed that with Mercurial encouraging extension authors to wrap
> functions, the overhead wasn't significant.


Well, those functions are usually called just a tiny handful of times, so
that tends to be true in those cases.


> It looks to me like the conditional actually adds significant pain in its
> own right, almost as much as the complex class based wrapper does.
>

Yes.


> Assuming the community doesn't object strongly, I'll redo to be
> unconditionally measuring time (and only conditionally logging it), as the
> overhead of conditionality seems excessive for this purpose.


If you'd also use the same approach as I do of direct measurement in the
places that actually do the I/O, instead of indirection, that would be
great. In fact, I think you should take over my patches and adapt them to
your needs. The added detail in my patches of measuring when Mercurial
stops doing work, instead of when it exits, is going to be important to
your purposes, because just counting ui I/O time won't capture that.

I definitely want the reporting to be conditional - if you're not going to
> use the time, we shouldn't output it.


Clearly.

I also think there's no value to tracking time on stdin, stdout, and stderr
separately, which is why I didn't do that – but that there *is* value in
measuring invocations of subprocesses.


> From my point of view, the goals are:
>

>  1. Establish groups of "not Mercurial's fault" slowness - stdio, extdiff,
> editor invoked for histedit etc. This combines with FB infra to let us
> focus on cases where Mercurial code is the bottleneck.
>
>  2. Do so in such a way that users not on FB infra aren't affected unless
> they choose to be, even if they're using things like logtoprocess today.
>

Yep. If you do the measurement directly, it's cheap enough that you don't
need to worry about it.
Simon Farnsworth - Feb. 3, 2017, 10:55 p.m.
On 03/02/2017 22:36, Bryan O'Sullivan wrote:
> On Fri, Feb 3, 2017 at 2:26 PM, Simon Farnsworth <simonfar@fb.com
> <mailto:simonfar@fb.com>> wrote:
>
>     Assuming the community doesn't object strongly, I'll redo to be
>     unconditionally measuring time (and only conditionally logging it),
>     as the overhead of conditionality seems excessive for this purpose.
>
>
> If you'd also use the same approach as I do of direct measurement in the
> places that actually do the I/O, instead of indirection, that would be
> great. In fact, I think you should take over my patches and adapt them
> to your needs. The added detail in my patches of measuring when
> Mercurial stops doing work, instead of when it exits, is going to be
> important to your purposes, because just counting ui I/O time won't
> capture that.
>

We already capture start and exit times in the external wrapper process 
- so your added detail of measuring "useful work" (in addition to "total 
time") tells us if we're seeing a huge overhead in setup/teardown of 
Mercurial.
Bryan O'Sullivan - Feb. 3, 2017, 11 p.m.
On Fri, Feb 3, 2017 at 2:55 PM, Simon Farnsworth <simonfar@fb.com> wrote:

> We already capture start and exit times in the external wrapper process -
> so your added detail of measuring "useful work" (in addition to "total
> time") tells us if we're seeing a huge overhead in setup/teardown of
> Mercurial.
>

I think you might have missed a bit of crucial detail. If Mercurial sits
for 100 minutes after writing its output to the pager before the user quits
the pager, that's not going to show up as I/O time with your scheme, but it
is going to show up as elapsed time. This is why capturing that measuring
elapsed time is not useful. There's an indeterminate amount of time where
literally nothing is happening (it's not teardown, and it's not overhead,
it's just waiting for the pager) that you need to be able to discount.
Simon Farnsworth - Feb. 3, 2017, 11:04 p.m.
On 03/02/2017 23:00, Bryan O'Sullivan wrote:
>
> On Fri, Feb 3, 2017 at 2:55 PM, Simon Farnsworth <simonfar@fb.com
> <mailto:simonfar@fb.com>> wrote:
>
>     We already capture start and exit times in the external wrapper
>     process - so your added detail of measuring "useful work" (in
>     addition to "total time") tells us if we're seeing a huge overhead
>     in setup/teardown of Mercurial.
>
>
> I think you might have missed a bit of crucial detail. If Mercurial sits
> for 100 minutes after writing its output to the pager before the user
> quits the pager, that's not going to show up as I/O time with your
> scheme, but it is going to show up as elapsed time. This is why
> capturing that measuring elapsed time is not useful. There's an
> indeterminate amount of time where literally nothing is happening (it's
> not teardown, and it's not overhead, it's just waiting for the pager)
> that you need to be able to discount.

Yes - I'd accounted for that in an early prototype (which included 
pager.wait() in stdio time) but lost it by this version.

Thanks for the reminder!

Patch

diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -3543,3 +3543,38 @@ 
 
 # convenient shortcut
 dst = debugstacktrace
+
+class elapsedtimewrapper(object):
+    def __init__(self, orig):
+        self.__orig = orig
+        self.__istiming = False
+        self.counttime = True
+        self.elapsedms = 0
+
+    def __call__(self, *args, **kwargs):
+        if self.counttime:
+            return self.__time(self.__orig, *args, **kwargs)
+        else:
+            return self.__orig(*args, **kwargs)
+
+    def __getattr__(self, name):
+        origattr = getattr(self.__orig, name)
+        if self.counttime and callable(origattr):
+            def closure(*args, **kwargs):
+                return self.__time(origattr, *args, **kwargs)
+            return closure
+        else:
+            return origattr
+
+    def __time(self, func, *args, **kwargs):
+        if self.__istiming:
+            return func(*args, **kwargs)
+
+        start = time.time()
+        try:
+            self.__istiming = True
+            return func(*args, **kwargs)
+        finally:
+            duration = time.time() - start
+            self.elapsedms += duration * 1000
+            self.__istiming = False