Patchwork [6,of,6,frozen-repos] commands.log: operate on frozen repository

login
register
mail settings
Submitter Gregory Szorc
Date Nov. 22, 2015, 1:14 a.m.
Message ID <f8ab893d8dac5246224f.1448154843@ubuntu-main>
Download mbox | patch
Permalink /patch/11570/
State Deferred
Delegated to: Pierre-Yves David
Headers show

Comments

Gregory Szorc - Nov. 22, 2015, 1:14 a.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1448152280 28800
#      Sat Nov 21 16:31:20 2015 -0800
# Node ID f8ab893d8dac5246224f330fafdea35c0bcef313
# Parent  ad891b362564b30ec69bc844a3fd98e73b6e032e
commands.log: operate on frozen repository

`hg log` is a read-only command. As such, it can use a frozen
repository instance so it isn't constantly performing changelog
freshness checks.

On a local Firefox repository with ~312,000 changesets, this
simple change results in `hg log -T '{rev}\n' > /dev/null`
with an empty/default hgrc becoming faster:

before: 7.81s
after:  6.11s
delta: -1.70s (78.2% of original

A full `hg log` shows similar percentage improvement:

before: 31.14s
after:  24.95s
delta:  -6.19s (80.1% of original)

I'm actually surprised the full `hg log` output showed a similar
percentage speedup: I would have expected it to only decrease by
~1.70s. But counting function calls does reveal that the full
template output incurs an extra changectx instantiation for each
revision as well as additional changelog accesses (for parents
I believe).

For comparison purposes, git 2.6.3 for the same count of commits on
a Git conversion of the Firefox repo takes ~1.85s for
`git log --pretty=%H` and ~2.36s for `git log`. We're 3.3x slower
for printing revisions (4.2x before) and 10.6x slower for the
default template output (13.2x before).
Gregory Szorc - Nov. 22, 2015, 2:48 a.m.
On Sat, Nov 21, 2015 at 5:14 PM, Gregory Szorc <gregory.szorc@gmail.com>
wrote:

> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1448152280 28800
> #      Sat Nov 21 16:31:20 2015 -0800
> # Node ID f8ab893d8dac5246224f330fafdea35c0bcef313
> # Parent  ad891b362564b30ec69bc844a3fd98e73b6e032e
> commands.log: operate on frozen repository
>
> `hg log` is a read-only command. As such, it can use a frozen
> repository instance so it isn't constantly performing changelog
> freshness checks.
>
> On a local Firefox repository with ~312,000 changesets, this
> simple change results in `hg log -T '{rev}\n' > /dev/null`
> with an empty/default hgrc becoming faster:
>
> before: 7.81s
> after:  6.11s
> delta: -1.70s (78.2% of original
>
> A full `hg log` shows similar percentage improvement:
>
> before: 31.14s
> after:  24.95s
> delta:  -6.19s (80.1% of original)
>
> I'm actually surprised the full `hg log` output showed a similar
> percentage speedup: I would have expected it to only decrease by
> ~1.70s. But counting function calls does reveal that the full
> template output incurs an extra changectx instantiation for each
> revision as well as additional changelog accesses (for parents
> I believe).
>
> For comparison purposes, git 2.6.3 for the same count of commits on
> a Git conversion of the Firefox repo takes ~1.85s for
> `git log --pretty=%H` and ~2.36s for `git log`. We're 3.3x slower
> for printing revisions (4.2x before) and 10.6x slower for the
> default template output (13.2x before).
>

So, I'm looking at further ways to optimize `hg log` performance.

`hg log -q` is a fast path that doesn't involve the templating system. It's
insanely fast. 2.79s, or 2.19x faster.

For kicks, I experimented with PyPy 4.0.0. `hg log -q > /dev/null` is 0.50s
(faster than Git!). `hg log` is 7.96s. That's 12.2x and 3.13x faster,
respectively. Insane. I experimented with PyPy for `hg unbundle`
performance analysis a few weeks ago. changelog reading was 2-3x faster.
Manifest reading was slower. But apparently that is due to the manifest
reading Python code doing stupid things compared to the C code.

PyPy appears to have a ~40ms startup overhead compared to CPython. Throw in
chg and that disappears and suddenly PyPy seems like a pretty compelling
alternative to writing code in languages that aren't Python (C, Rust, etc).

Anyway, it's pretty clear the templating code is a bit of a performance
bottleneck. I'll see if I can work some more magic...


>
> diff --git a/mercurial/commands.py b/mercurial/commands.py
> --- a/mercurial/commands.py
> +++ b/mercurial/commands.py
> @@ -4784,16 +4784,18 @@ def log(ui, repo, *pats, **opts):
>      specifying revisions.
>
>      See :hg:`help templates` for more about pre-packaged styles and
>      specifying custom templates.
>
>      Returns 0 on success.
>
>      """
> +    repo = repo.frozen()
> +
>      if opts.get('follow') and opts.get('rev'):
>          opts['rev'] = [revset.formatspec('reverse(::%lr)',
> opts.get('rev'))]
>          del opts['follow']
>
>      if opts.get('graph'):
>          return cmdutil.graphlog(ui, repo, *pats, **opts)
>
>      revs, expr, filematcher = cmdutil.getlogrevs(repo, pats, opts)
>
Yuya Nishihara - Nov. 23, 2015, 6:42 a.m.
On Sat, 21 Nov 2015 18:48:10 -0800, Gregory Szorc wrote:
> On Sat, Nov 21, 2015 at 5:14 PM, Gregory Szorc <gregory.szorc@gmail.com>
> wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc@gmail.com>
> > # Date 1448152280 28800
> > #      Sat Nov 21 16:31:20 2015 -0800
> > # Node ID f8ab893d8dac5246224f330fafdea35c0bcef313
> > # Parent  ad891b362564b30ec69bc844a3fd98e73b6e032e
> > commands.log: operate on frozen repository
> >
> > `hg log` is a read-only command. As such, it can use a frozen
> > repository instance so it isn't constantly performing changelog
> > freshness checks.
> >
> > On a local Firefox repository with ~312,000 changesets, this
> > simple change results in `hg log -T '{rev}\n' > /dev/null`
> > with an empty/default hgrc becoming faster:
> >
> > before: 7.81s
> > after:  6.11s
> > delta: -1.70s (78.2% of original
> >
> > A full `hg log` shows similar percentage improvement:
> >
> > before: 31.14s
> > after:  24.95s
> > delta:  -6.19s (80.1% of original)
> >
> > I'm actually surprised the full `hg log` output showed a similar
> > percentage speedup: I would have expected it to only decrease by
> > ~1.70s. But counting function calls does reveal that the full
> > template output incurs an extra changectx instantiation for each
> > revision as well as additional changelog accesses (for parents
> > I believe).
> >
> > For comparison purposes, git 2.6.3 for the same count of commits on
> > a Git conversion of the Firefox repo takes ~1.85s for
> > `git log --pretty=%H` and ~2.36s for `git log`. We're 3.3x slower
> > for printing revisions (4.2x before) and 10.6x slower for the
> > default template output (13.2x before).
> >
> 
> So, I'm looking at further ways to optimize `hg log` performance.
> 
> `hg log -q` is a fast path that doesn't involve the templating system. It's
> insanely fast. 2.79s, or 2.19x faster.

"log -q" still involves the templater.

  % export HGRCPATH=/dev/null
  % export LANG=C

  % hg log --time -l20000 -Tdefault > /dev/null
  time: real 7.210 secs (user 7.190+0.000 sys 0.020+0.000)
  % hg log --time -l20000  > /dev/null
  time: real 3.260 secs (user 3.250+0.000 sys 0.010+0.000)

  % hg log --time -l20000 -Tdefault -q > /dev/null
  time: real 1.130 secs (user 1.120+0.000 sys 0.010+0.000)
  % hg log --time -l20000 -q > /dev/null
  time: real 0.370 secs (user 0.370+0.000 sys 0.020+0.000)

> For kicks, I experimented with PyPy 4.0.0. `hg log -q > /dev/null` is 0.50s
> (faster than Git!). `hg log` is 7.96s. That's 12.2x and 3.13x faster,
> respectively. Insane. I experimented with PyPy for `hg unbundle`
> performance analysis a few weeks ago. changelog reading was 2-3x faster.
> Manifest reading was slower. But apparently that is due to the manifest
> reading Python code doing stupid things compared to the C code.
> 
> PyPy appears to have a ~40ms startup overhead compared to CPython. Throw in
> chg and that disappears and suddenly PyPy seems like a pretty compelling
> alternative to writing code in languages that aren't Python (C, Rust, etc).

I should say that chg can't be run on Windows.

> Anyway, it's pretty clear the templating code is a bit of a performance
> bottleneck. I'll see if I can work some more magic...

Yes, and PyPy can optimize it as templater would execute the same code path
repeatedly.

Patch

diff --git a/mercurial/commands.py b/mercurial/commands.py
--- a/mercurial/commands.py
+++ b/mercurial/commands.py
@@ -4784,16 +4784,18 @@  def log(ui, repo, *pats, **opts):
     specifying revisions.
 
     See :hg:`help templates` for more about pre-packaged styles and
     specifying custom templates.
 
     Returns 0 on success.
 
     """
+    repo = repo.frozen()
+
     if opts.get('follow') and opts.get('rev'):
         opts['rev'] = [revset.formatspec('reverse(::%lr)', opts.get('rev'))]
         del opts['follow']
 
     if opts.get('graph'):
         return cmdutil.graphlog(ui, repo, *pats, **opts)
 
     revs, expr, filematcher = cmdutil.getlogrevs(repo, pats, opts)