Patchwork [2,of,2] ui: record and print wall clock time

login
register
mail settings
Submitter Bryan O'Sullivan
Date Feb. 3, 2017, 12:33 a.m.
Message ID <97f1791e99b8def3903b.1486082002@bryano-mbp.local>
Download mbox | patch
Permalink /patch/18307/
State Deferred
Headers show

Comments

Bryan O'Sullivan - Feb. 3, 2017, 12:33 a.m.
# HG changeset patch
# User Bryan O'Sullivan <bryano@fb.com>
# Date 1486062809 28800
#      Thu Feb 02 11:13:29 2017 -0800
# Node ID 97f1791e99b8def3903b8e287c6192f17fbf0151
# Parent  61c6d4a49500f37aca3a7ae58b83d43c64eaa846
ui: record and print wall clock time
Bryan O'Sullivan - Feb. 3, 2017, 1:22 a.m.
On Thu, Feb 2, 2017 at 4:33 PM, Bryan O'Sullivan <bos@serpentine.com> wrote:

> ui: record and print wall clock time
>

This patch is a total hack, but it contains an interesting few nuggets that
it doesn't call out explicitly.

As Simon mentions in one of his patches, we log command performance so that
we can analyse it at scale and identify, and then fix, problems. But
today's simple measure of wallclock time is thrown into confusion by
anything interactive: calls to ssh (might ask for a password), prompts for
input, merges, pagers, you name it.

Measuring CPU time isn't any better, because a cold disk cache can result
in long I/O waits that aren't accounted for, but which a user will
definitely notice. So a wallclock metric that more accurately tracks when
Mercurial is "really doing work" offers significant value.

The current infrastructure for measuring time (when you use "hg --time")
runs as an atexit hook, so it stops counting when Mercurial is about to
exit. Thanks to the pager infrastructure, it's possible to exit long after
we've finished doing any real work, because we're waiting on the pager
child process to quit.

As a quantity to measure, "how long between us starting a timer and the
process exiting" is arguably not (ever?) what we really want. Instead, I
think it's "how long between starting and finishing meaningful work", which
the "wall" figure in the patch above gets closer to. Being able to identify
time spent waiting on I/O or subprocesses (the "io" figure in the patch,
and the main focus of Simon's patches) is another aspect of accurate
counting for doing performance analysis at scale.
Augie Fackler - Feb. 6, 2017, 6:45 p.m.
On Thu, Feb 02, 2017 at 05:22:07PM -0800, Bryan O'Sullivan wrote:
> On Thu, Feb 2, 2017 at 4:33 PM, Bryan O'Sullivan <bos@serpentine.com> wrote:
>
> > ui: record and print wall clock time
> >
>
> This patch is a total hack, but it contains an interesting few nuggets that
> it doesn't call out explicitly.
>
> As Simon mentions in one of his patches, we log command performance so that
> we can analyse it at scale and identify, and then fix, problems. But
> today's simple measure of wallclock time is thrown into confusion by
> anything interactive: calls to ssh (might ask for a password), prompts for
> input, merges, pagers, you name it.
>
> Measuring CPU time isn't any better, because a cold disk cache can result
> in long I/O waits that aren't accounted for, but which a user will
> definitely notice. So a wallclock metric that more accurately tracks when
> Mercurial is "really doing work" offers significant value.
>
> The current infrastructure for measuring time (when you use "hg --time")
> runs as an atexit hook, so it stops counting when Mercurial is about to
> exit. Thanks to the pager infrastructure, it's possible to exit long after
> we've finished doing any real work, because we're waiting on the pager
> child process to quit.

I wonder if we shouldn't just ditch the atexit hook entirely and
instead print the recorded time as we exit the appropriate function in
dispatch?

>
> As a quantity to measure, "how long between us starting a timer and the
> process exiting" is arguably not (ever?) what we really want. Instead, I
> think it's "how long between starting and finishing meaningful work", which
> the "wall" figure in the patch above gets closer to. Being able to identify
> time spent waiting on I/O or subprocesses (the "io" figure in the patch,
> and the main focus of Simon's patches) is another aspect of accurate
> counting for doing performance analysis at scale.

> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Patch

diff --git a/mercurial/dispatch.py b/mercurial/dispatch.py
--- a/mercurial/dispatch.py
+++ b/mercurial/dispatch.py
@@ -93,6 +93,7 @@  def _formatparse(write, inst):
 
 def dispatch(req):
     "run the command specified in req.args"
+    starttime = time.time()
     if req.ferr:
         ferr = req.ferr
     elif req.ui:
@@ -123,7 +124,6 @@  def dispatch(req):
         return -1
 
     msg = ' '.join(' ' in a and repr(a) or a for a in req.args)
-    starttime = time.time()
     ret = None
     try:
         ret = _runcatch(req)
@@ -644,6 +644,7 @@  extraloaders = [
 ]
 
 def _dispatch(req):
+    starttime = time.time()
     args = req.args
     ui = req.ui
 
@@ -717,8 +718,10 @@  def _dispatch(req):
         s = get_times()
         def print_time():
             t = get_times()
-            ui.warn(_("time: real %.3f secs (user %.3f+%.3f sys %.3f+%.3f)\n") %
-                (t[4]-s[4], t[0]-s[0], t[2]-s[2], t[1]-s[1], t[3]-s[3]))
+            ui.warn(_("time: real %.3f secs "
+                      "(user %.3f+%.3f sys %.3f+%.3f io %.3f wall %.3f)\n") %
+                (t[4]-s[4], t[0]-s[0], t[2]-s[2], t[1]-s[1], t[3]-s[3],
+                 ui._iotime, ui._walltime))
         atexit.register(print_time)
 
     uis = set([ui, lui])
@@ -812,6 +815,7 @@  def _dispatch(req):
         finally:
             if repo and repo != req.repo:
                 repo.close()
+            ui._walltime = time.time() - starttime
 
 def _runcommand(ui, options, cmd, cmdfunc):
     """Run a command function, possibly with profiling enabled."""