Patchwork [1,of,2] perf: add perfbdiff

login
register
mail settings
Submitter Gregory Szorc
Date Nov. 6, 2016, 7:42 a.m.
Message ID <e65ef545a3dcb46fe0f1.1478418147@ubuntu-vm-main>
Download mbox | patch
Permalink /patch/17373/
State Accepted
Headers show

Comments

Gregory Szorc - Nov. 6, 2016, 7:42 a.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1478414512 25200
#      Sat Nov 05 23:41:52 2016 -0700
# Node ID e65ef545a3dcb46fe0f1798e76ea17f9f9323452
# Parent  f01367faa792635ad2f7a6b175ae3252292b5121
perf: add perfbdiff

bdiff shows up a lot in profiling. I think it would be useful to have
a perf command that runs bdiff over and over so we can find hot spots.
Mads Kiilerich - Nov. 6, 2016, 4:15 p.m.
On 11/06/2016 08:42 AM, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1478414512 25200
> #      Sat Nov 05 23:41:52 2016 -0700
> # Node ID e65ef545a3dcb46fe0f1798e76ea17f9f9323452
> # Parent  f01367faa792635ad2f7a6b175ae3252292b5121
> perf: add perfbdiff
>
> bdiff shows up a lot in profiling. I think it would be useful to have
> a perf command that runs bdiff over and over so we can find hot spots.
>
> diff --git a/contrib/perf.py b/contrib/perf.py
> --- a/contrib/perf.py
> +++ b/contrib/perf.py
> @@ -25,6 +25,7 @@ import random
>   import sys
>   import time
>   from mercurial import (
> +    bdiff,
>       changegroup,
>       cmdutil,
>       commands,
> @@ -746,6 +747,30 @@ def perffncacheencode(ui, repo, **opts):
>       timer(d)
>       fm.end()
>   
> +@command('perfbdiff', revlogopts + formatteropts, '-c|-m|FILE REV')

I would prefer to keep it simple and consistently use -r for specifying 
revisions.

> +def perfbdiff(ui, repo, file_, rev=None, **opts):
> +    """benchmark a bdiff between a revision and its delta parent"""
> +    if opts.get('changelog') or opts.get('manifest'):
> +        file_, rev = None, file_
> +    elif rev is None:
> +        raise error.CommandError('perfbdiff', 'invalid arguments')
> +
> +    r = cmdutil.openrevlog(repo, 'perfbdiff', file_, opts)
> +
> +    node = r.lookup(rev)
> +    rev = r.rev(node)

This might be a stupid lazy question that essential is a request for 
more clarity in code and docstring:
Why this back and forth between rev and node?
Must rev always be a filelog revision ... or is it a changelog revision 
which then is changed to revlog revision while reusing the variable 
name? Perhaps reduce confusion by using different names.

> +    dp = r.deltaparent(rev)

Should it also consider aggressivemergedeltas?

Or perhaps more generally: should it be possible to compare any two 
revisions - especially for filelogs, to also cover the use case of diff?


More important: patch 2 LGTM.

/Mads
Gregory Szorc - Nov. 6, 2016, 5:34 p.m.
On Sun, Nov 6, 2016 at 8:15 AM, Mads Kiilerich <mads@kiilerich.com> wrote:

> On 11/06/2016 08:42 AM, Gregory Szorc wrote:
>
>> # HG changeset patch
>> # User Gregory Szorc <gregory.szorc@gmail.com>
>> # Date 1478414512 25200
>> #      Sat Nov 05 23:41:52 2016 -0700
>> # Node ID e65ef545a3dcb46fe0f1798e76ea17f9f9323452
>> # Parent  f01367faa792635ad2f7a6b175ae3252292b5121
>> perf: add perfbdiff
>>
>> bdiff shows up a lot in profiling. I think it would be useful to have
>> a perf command that runs bdiff over and over so we can find hot spots.
>>
>> diff --git a/contrib/perf.py b/contrib/perf.py
>> --- a/contrib/perf.py
>> +++ b/contrib/perf.py
>> @@ -25,6 +25,7 @@ import random
>>   import sys
>>   import time
>>   from mercurial import (
>> +    bdiff,
>>       changegroup,
>>       cmdutil,
>>       commands,
>> @@ -746,6 +747,30 @@ def perffncacheencode(ui, repo, **opts):
>>       timer(d)
>>       fm.end()
>>   +@command('perfbdiff', revlogopts + formatteropts, '-c|-m|FILE REV')
>>
>
> I would prefer to keep it simple and consistently use -r for specifying
> revisions.
>
> +def perfbdiff(ui, repo, file_, rev=None, **opts):
>> +    """benchmark a bdiff between a revision and its delta parent"""
>> +    if opts.get('changelog') or opts.get('manifest'):
>> +        file_, rev = None, file_
>> +    elif rev is None:
>> +        raise error.CommandError('perfbdiff', 'invalid arguments')
>> +
>> +    r = cmdutil.openrevlog(repo, 'perfbdiff', file_, opts)
>> +
>> +    node = r.lookup(rev)
>> +    rev = r.rev(node)
>>
>
> This might be a stupid lazy question that essential is a request for more
> clarity in code and docstring:
> Why this back and forth between rev and node?
> Must rev always be a filelog revision ... or is it a changelog revision
> which then is changed to revlog revision while reusing the variable name?
> Perhaps reduce confusion by using different names.
>
> +    dp = r.deltaparent(rev)
>>
>
> Should it also consider aggressivemergedeltas?
>
> Or perhaps more generally: should it be possible to compare any two
> revisions - especially for filelogs, to also cover the use case of diff?
>

There are a lot of potential improvements to this command. I felt like
getting the simplest thing implemented to justify the trivial bdiff.c patch.

Now that I see that line hashing is a hot spot and changing line hashing
means potential for performance loss due to hash collisions later in the
algorithm, I'll have to benchmark a much larger corpus to validate any
changes to the hash. And that means improving perfbdiff. So stand by...


>
>
> More important: patch 2 LGTM.
>
> /Mads
>
>

Patch

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -25,6 +25,7 @@  import random
 import sys
 import time
 from mercurial import (
+    bdiff,
     changegroup,
     cmdutil,
     commands,
@@ -746,6 +747,30 @@  def perffncacheencode(ui, repo, **opts):
     timer(d)
     fm.end()
 
+@command('perfbdiff', revlogopts + formatteropts, '-c|-m|FILE REV')
+def perfbdiff(ui, repo, file_, rev=None, **opts):
+    """benchmark a bdiff between a revision and its delta parent"""
+    if opts.get('changelog') or opts.get('manifest'):
+        file_, rev = None, file_
+    elif rev is None:
+        raise error.CommandError('perfbdiff', 'invalid arguments')
+
+    r = cmdutil.openrevlog(repo, 'perfbdiff', file_, opts)
+
+    node = r.lookup(rev)
+    rev = r.rev(node)
+    dp = r.deltaparent(rev)
+
+    text1 = r.revision(dp)
+    text2 = r.revision(node)
+
+    def d():
+        bdiff.bdiff(text1, text2)
+
+    timer, fm = gettimer(ui, opts)
+    timer(d)
+    fm.end()
+
 @command('perfdiffwd', formatteropts)
 def perfdiffwd(ui, repo, **opts):
     """Profile diff of working directory changes"""
diff --git a/tests/test-contrib-perf.t b/tests/test-contrib-perf.t
--- a/tests/test-contrib-perf.t
+++ b/tests/test-contrib-perf.t
@@ -50,6 +50,7 @@  perfstatus
    perfancestorset
                  (no help text available)
    perfannotate  (no help text available)
+   perfbdiff     benchmark a bdiff between a revision and its delta parent
    perfbranchmap
                  benchmark the update of a branchmap
    perfcca       (no help text available)
@@ -112,6 +113,7 @@  perfstatus
   $ hg perfancestors
   $ hg perfancestorset 2
   $ hg perfannotate a
+  $ hg perfbdiff -c 1
   $ hg perfbranchmap
   $ hg perfcca
   $ hg perfchangegroupchangelog