Patchwork [4,of,8,v4] ui: log time spent blocked on stdio

login
register
mail settings
Submitter Simon Farnsworth
Date Feb. 13, 2017, 5:29 p.m.
Message ID <923dcef529e523b391e2.1487006944@devvm022.lla2.facebook.com>
Download mbox | patch
Permalink /patch/18448/
State Changes Requested
Headers show

Comments

Simon Farnsworth - Feb. 13, 2017, 5:29 p.m.
# HG changeset patch
# User Simon Farnsworth <simonfar@fb.com>
# Date 1486994849 28800
#      Mon Feb 13 06:07:29 2017 -0800
# Node ID 923dcef529e523b391e26840cfa9ea6fd53c4131
# Parent  5a595ee2509989e440d70eb0e134fea915a0a8d6
ui: log time spent blocked on stdio

We use a wrapper around Mercurial at Facebook that logs key statistics (like
elpased time) to our standard performance tooling.

This is less useful than it could be, because we currently can't tell when a
command is slow because we need to fix Mercurial versus when a command is
slow because the user isn't interacting quickly.

Teach Mercurial to log the time it spends blocked, so that our tooling can
pick it up and submit it with the elapsed time - we can then do the math in
our tooling to see if Mercurial is slow, or if the user simply failed to
interact.

Combining this with the command duration log means that we can ensure that
we concentrate performance efforts on the things that bite Facebook users.

The perfwrite microbenchmark shifts to:
Linux:
! wall 3.316087 comb 0.900000 user 0.810000 sys 0.090000 (best of 3)
Mac:
! wall 0.939282 comb 0.580000 user 0.470000 sys 0.110000 (best of 8)

If I open-code timings in ui.write instead of using the context manager, I see:
Linux:
! wall 3.478070 comb 0.500000 user 0.420000 sys 0.080000 (best of 3)
Mac:
! wall 0.218112 comb 0.220000 user 0.150000 sys 0.070000 (best of 15)
Bryan O'Sullivan - Feb. 13, 2017, 6 p.m.
On Mon, Feb 13, 2017 at 9:29 AM, Simon Farnsworth <simonfar@fb.com> wrote:

> The perfwrite microbenchmark shifts to:
> Linux:
> ! wall 3.316087 comb 0.900000 user 0.810000 sys 0.090000 (best of 3)
> Mac:
> ! wall 0.939282 comb 0.580000 user 0.470000 sys 0.110000 (best of 8)
>
> If I open-code timings in ui.write instead of using the context manager, I
> see:
> Linux:
> ! wall 3.478070 comb 0.500000 user 0.420000 sys 0.080000 (best of 3)
> Mac:
> ! wall 0.218112 comb 0.220000 user 0.150000 sys 0.070000 (best of 15)
>
>
Thanks for adding this data. (For future reference, it's good to include an
analysis of the difference instead of just the plain numbers, as I have to
flip back and forth between two emails and memorize the numbers to do the
comparison in my head.)

These numbers suggest to me that open-coding the performance measurement is
the right thing to do for ui.write and ui.flush.

Here's why.

First, let's discount the wall time, as this number is so noisy as to make
no sense. For example, if you compare the uninstrumented code with the
numbers above, you'll see that wall time has *dropped* by 30% or so on OSX,
even though we're doing more work. This is a good object lesson in why
contrib/perf.py is very far from a trustworthy benchmarking tool :-)

If you look instead at the somewhat less suspect "comb" time (user+system
combined), you'll see that it more than doubles under the context manager,
vs a smallish bump with open coding of the measurements.

With its fancy templating and colour support, Mercurial is already
surprisingly slow at simply writing to stdout, an activity that is
obviously on the critical path for many uses. I believe it's worth the few
extra lines of code to keep the lowest-level part of this as fast as
possible.
Simon Farnsworth - Feb. 13, 2017, 6:15 p.m.
On 13/02/2017 18:00, Bryan O'Sullivan wrote:
>
> On Mon, Feb 13, 2017 at 9:29 AM, Simon Farnsworth <simonfar@fb.com
> <mailto:simonfar@fb.com>> wrote:
>
>     The perfwrite microbenchmark shifts to:
>     Linux:
>     ! wall 3.316087 comb 0.900000 user 0.810000 sys 0.090000 (best of 3)
>     Mac:
>     ! wall 0.939282 comb 0.580000 user 0.470000 sys 0.110000 (best of 8)
>
>     If I open-code timings in ui.write instead of using the context
>     manager, I see:
>     Linux:
>     ! wall 3.478070 comb 0.500000 user 0.420000 sys 0.080000 (best of 3)
>     Mac:
>     ! wall 0.218112 comb 0.220000 user 0.150000 sys 0.070000 (best of 15)
>
>
> Thanks for adding this data. (For future reference, it's good to include
> an analysis of the difference instead of just the plain numbers, as I
> have to flip back and forth between two emails and memorize the numbers
> to do the comparison in my head.)
>
> These numbers suggest to me that open-coding the performance measurement
> is the right thing to do for ui.write and ui.flush.
>
I'll make that change locally, but I'm not going to resubmit for a few 
days, to give other interested parties time to weigh in.
Augie Fackler - Feb. 13, 2017, 7:40 p.m.
On Mon, Feb 13, 2017 at 10:00:25AM -0800, Bryan O'Sullivan wrote:
> On Mon, Feb 13, 2017 at 9:29 AM, Simon Farnsworth <simonfar@fb.com> wrote:
>
> > The perfwrite microbenchmark shifts to:
> > Linux:
> > ! wall 3.316087 comb 0.900000 user 0.810000 sys 0.090000 (best of 3)
> > Mac:
> > ! wall 0.939282 comb 0.580000 user 0.470000 sys 0.110000 (best of 8)
> >
> > If I open-code timings in ui.write instead of using the context manager, I
> > see:
> > Linux:
> > ! wall 3.478070 comb 0.500000 user 0.420000 sys 0.080000 (best of 3)
> > Mac:
> > ! wall 0.218112 comb 0.220000 user 0.150000 sys 0.070000 (best of 15)
> >
> >
> Thanks for adding this data. (For future reference, it's good to include an
> analysis of the difference instead of just the plain numbers, as I have to
> flip back and forth between two emails and memorize the numbers to do the
> comparison in my head.)
>
> These numbers suggest to me that open-coding the performance measurement is
> the right thing to do for ui.write and ui.flush.

+1 on open-coding for perf reasons, maybe make a comment to that
effect so future optimizers don't get overeager.

>
> Here's why.
>
> First, let's discount the wall time, as this number is so noisy as to make
> no sense. For example, if you compare the uninstrumented code with the
> numbers above, you'll see that wall time has *dropped* by 30% or so on OSX,
> even though we're doing more work. This is a good object lesson in why
> contrib/perf.py is very far from a trustworthy benchmarking tool :-)
>
> If you look instead at the somewhat less suspect "comb" time (user+system
> combined), you'll see that it more than doubles under the context manager,
> vs a smallish bump with open coding of the measurements.
>
> With its fancy templating and colour support, Mercurial is already
> surprisingly slow at simply writing to stdout, an activity that is
> obviously on the critical path for many uses. I believe it's worth the few
> extra lines of code to keep the lowest-level part of this as fast as
> possible.

> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Patch

diff --git a/mercurial/ui.py b/mercurial/ui.py
--- a/mercurial/ui.py
+++ b/mercurial/ui.py
@@ -728,31 +728,34 @@ 
             self._buffers[-1].extend(a for a in args)
         else:
             self._progclear()
-            for a in args:
-                self.fout.write(a)
+            with self.timeblockedsection('stdio'):
+                for a in args:
+                    self.fout.write(a)
 
     def write_err(self, *args, **opts):
         self._progclear()
         try:
             if self._bufferstates and self._bufferstates[-1][0]:
                 return self.write(*args, **opts)
-            if not getattr(self.fout, 'closed', False):
-                self.fout.flush()
-            for a in args:
-                self.ferr.write(a)
-            # stderr may be buffered under win32 when redirected to files,
-            # including stdout.
-            if not getattr(self.ferr, 'closed', False):
-                self.ferr.flush()
+            with self.timeblockedsection('stdio'):
+                if not getattr(self.fout, 'closed', False):
+                    self.fout.flush()
+                for a in args:
+                    self.ferr.write(a)
+                # stderr may be buffered under win32 when redirected to files,
+                # including stdout.
+                if not getattr(self.ferr, 'closed', False):
+                    self.ferr.flush()
         except IOError as inst:
             if inst.errno not in (errno.EPIPE, errno.EIO, errno.EBADF):
                 raise
 
     def flush(self):
-        try: self.fout.flush()
-        except (IOError, ValueError): pass
-        try: self.ferr.flush()
-        except (IOError, ValueError): pass
+        with self.timeblockedsection('stdio'):
+            try: self.fout.flush()
+            except (IOError, ValueError): pass
+            try: self.ferr.flush()
+            except (IOError, ValueError): pass
 
     def _isatty(self, fh):
         if self.configbool('ui', 'nontty', False):
@@ -914,7 +917,8 @@ 
         sys.stdout = self.fout
         # prompt ' ' must exist; otherwise readline may delete entire line
         # - http://bugs.python.org/issue12833
-        line = raw_input(' ')
+        with self.timeblockedsection('stdio'):
+            line = raw_input(' ')
         sys.stdin = oldin
         sys.stdout = oldout
 
@@ -994,13 +998,14 @@ 
             self.write_err(self.label(prompt or _('password: '), 'ui.prompt'))
             # disable getpass() only if explicitly specified. it's still valid
             # to interact with tty even if fin is not a tty.
-            if self.configbool('ui', 'nontty'):
-                l = self.fin.readline()
-                if not l:
-                    raise EOFError
-                return l.rstrip('\n')
-            else:
-                return getpass.getpass('')
+            with self.timeblockedsection('stdio'):
+                if self.configbool('ui', 'nontty'):
+                    l = self.fin.readline()
+                    if not l:
+                        raise EOFError
+                    return l.rstrip('\n')
+                else:
+                    return getpass.getpass('')
         except EOFError:
             raise error.ResponseExpected()
     def status(self, *msg, **opts):