Submitter | Simon Farnsworth |
---|---|
Date | Feb. 13, 2017, 5:29 p.m. |
Message ID | <923dcef529e523b391e2.1487006944@devvm022.lla2.facebook.com> |
Download | mbox | patch |
Permalink | /patch/18448/ |
State | Changes Requested |
Headers | show |
Comments
On Mon, Feb 13, 2017 at 9:29 AM, Simon Farnsworth <simonfar@fb.com> wrote: > The perfwrite microbenchmark shifts to: > Linux: > ! wall 3.316087 comb 0.900000 user 0.810000 sys 0.090000 (best of 3) > Mac: > ! wall 0.939282 comb 0.580000 user 0.470000 sys 0.110000 (best of 8) > > If I open-code timings in ui.write instead of using the context manager, I > see: > Linux: > ! wall 3.478070 comb 0.500000 user 0.420000 sys 0.080000 (best of 3) > Mac: > ! wall 0.218112 comb 0.220000 user 0.150000 sys 0.070000 (best of 15) > > Thanks for adding this data. (For future reference, it's good to include an analysis of the difference instead of just the plain numbers, as I have to flip back and forth between two emails and memorize the numbers to do the comparison in my head.) These numbers suggest to me that open-coding the performance measurement is the right thing to do for ui.write and ui.flush. Here's why. First, let's discount the wall time, as this number is so noisy as to make no sense. For example, if you compare the uninstrumented code with the numbers above, you'll see that wall time has *dropped* by 30% or so on OSX, even though we're doing more work. This is a good object lesson in why contrib/perf.py is very far from a trustworthy benchmarking tool :-) If you look instead at the somewhat less suspect "comb" time (user+system combined), you'll see that it more than doubles under the context manager, vs a smallish bump with open coding of the measurements. With its fancy templating and colour support, Mercurial is already surprisingly slow at simply writing to stdout, an activity that is obviously on the critical path for many uses. I believe it's worth the few extra lines of code to keep the lowest-level part of this as fast as possible.
On 13/02/2017 18:00, Bryan O'Sullivan wrote: > > On Mon, Feb 13, 2017 at 9:29 AM, Simon Farnsworth <simonfar@fb.com > <mailto:simonfar@fb.com>> wrote: > > The perfwrite microbenchmark shifts to: > Linux: > ! wall 3.316087 comb 0.900000 user 0.810000 sys 0.090000 (best of 3) > Mac: > ! wall 0.939282 comb 0.580000 user 0.470000 sys 0.110000 (best of 8) > > If I open-code timings in ui.write instead of using the context > manager, I see: > Linux: > ! wall 3.478070 comb 0.500000 user 0.420000 sys 0.080000 (best of 3) > Mac: > ! wall 0.218112 comb 0.220000 user 0.150000 sys 0.070000 (best of 15) > > > Thanks for adding this data. (For future reference, it's good to include > an analysis of the difference instead of just the plain numbers, as I > have to flip back and forth between two emails and memorize the numbers > to do the comparison in my head.) > > These numbers suggest to me that open-coding the performance measurement > is the right thing to do for ui.write and ui.flush. > I'll make that change locally, but I'm not going to resubmit for a few days, to give other interested parties time to weigh in.
On Mon, Feb 13, 2017 at 10:00:25AM -0800, Bryan O'Sullivan wrote: > On Mon, Feb 13, 2017 at 9:29 AM, Simon Farnsworth <simonfar@fb.com> wrote: > > > The perfwrite microbenchmark shifts to: > > Linux: > > ! wall 3.316087 comb 0.900000 user 0.810000 sys 0.090000 (best of 3) > > Mac: > > ! wall 0.939282 comb 0.580000 user 0.470000 sys 0.110000 (best of 8) > > > > If I open-code timings in ui.write instead of using the context manager, I > > see: > > Linux: > > ! wall 3.478070 comb 0.500000 user 0.420000 sys 0.080000 (best of 3) > > Mac: > > ! wall 0.218112 comb 0.220000 user 0.150000 sys 0.070000 (best of 15) > > > > > Thanks for adding this data. (For future reference, it's good to include an > analysis of the difference instead of just the plain numbers, as I have to > flip back and forth between two emails and memorize the numbers to do the > comparison in my head.) > > These numbers suggest to me that open-coding the performance measurement is > the right thing to do for ui.write and ui.flush. +1 on open-coding for perf reasons, maybe make a comment to that effect so future optimizers don't get overeager. > > Here's why. > > First, let's discount the wall time, as this number is so noisy as to make > no sense. For example, if you compare the uninstrumented code with the > numbers above, you'll see that wall time has *dropped* by 30% or so on OSX, > even though we're doing more work. This is a good object lesson in why > contrib/perf.py is very far from a trustworthy benchmarking tool :-) > > If you look instead at the somewhat less suspect "comb" time (user+system > combined), you'll see that it more than doubles under the context manager, > vs a smallish bump with open coding of the measurements. > > With its fancy templating and colour support, Mercurial is already > surprisingly slow at simply writing to stdout, an activity that is > obviously on the critical path for many uses. I believe it's worth the few > extra lines of code to keep the lowest-level part of this as fast as > possible. > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Patch
diff --git a/mercurial/ui.py b/mercurial/ui.py --- a/mercurial/ui.py +++ b/mercurial/ui.py @@ -728,31 +728,34 @@ self._buffers[-1].extend(a for a in args) else: self._progclear() - for a in args: - self.fout.write(a) + with self.timeblockedsection('stdio'): + for a in args: + self.fout.write(a) def write_err(self, *args, **opts): self._progclear() try: if self._bufferstates and self._bufferstates[-1][0]: return self.write(*args, **opts) - if not getattr(self.fout, 'closed', False): - self.fout.flush() - for a in args: - self.ferr.write(a) - # stderr may be buffered under win32 when redirected to files, - # including stdout. - if not getattr(self.ferr, 'closed', False): - self.ferr.flush() + with self.timeblockedsection('stdio'): + if not getattr(self.fout, 'closed', False): + self.fout.flush() + for a in args: + self.ferr.write(a) + # stderr may be buffered under win32 when redirected to files, + # including stdout. + if not getattr(self.ferr, 'closed', False): + self.ferr.flush() except IOError as inst: if inst.errno not in (errno.EPIPE, errno.EIO, errno.EBADF): raise def flush(self): - try: self.fout.flush() - except (IOError, ValueError): pass - try: self.ferr.flush() - except (IOError, ValueError): pass + with self.timeblockedsection('stdio'): + try: self.fout.flush() + except (IOError, ValueError): pass + try: self.ferr.flush() + except (IOError, ValueError): pass def _isatty(self, fh): if self.configbool('ui', 'nontty', False): @@ -914,7 +917,8 @@ sys.stdout = self.fout # prompt ' ' must exist; otherwise readline may delete entire line # - http://bugs.python.org/issue12833 - line = raw_input(' ') + with self.timeblockedsection('stdio'): + line = raw_input(' ') sys.stdin = oldin sys.stdout = oldout @@ -994,13 +998,14 @@ self.write_err(self.label(prompt or _('password: '), 'ui.prompt')) # disable getpass() only if explicitly specified. it's still valid # to interact with tty even if fin is not a tty. - if self.configbool('ui', 'nontty'): - l = self.fin.readline() - if not l: - raise EOFError - return l.rstrip('\n') - else: - return getpass.getpass('') + with self.timeblockedsection('stdio'): + if self.configbool('ui', 'nontty'): + l = self.fin.readline() + if not l: + raise EOFError + return l.rstrip('\n') + else: + return getpass.getpass('') except EOFError: raise error.ResponseExpected() def status(self, *msg, **opts):