Patchwork annotate: optimize line counting

login
register
mail settings
Submitter Matt Mackall
Date May 23, 2016, 4:23 p.m.
Message ID <218a9c673a2cc278b1d5.1464020626@ruin.waste.org>
Download mbox | patch
Permalink /patch/15181/
State Accepted
Headers show

Comments

Matt Mackall - May 23, 2016, 4:23 p.m.
# HG changeset patch
# User Matt Mackall <mpm@selenic.com>
# Date 1463607452 18000
#      Wed May 18 16:37:32 2016 -0500
# Node ID 218a9c673a2cc278b1d53ba28eb0ec46b64d5b4e
# Parent  8c8442523eefac2d53e3f10ff1ebf37f4d3c63c3
annotate: optimize line counting

We used len(text.splitlines()) to count lines. This allocates, copies, and
deallocates an object for every line in a file. Instead, we use
count("\n") to count newlines and adjust based on whether there's a
trailing newline.

This improves the speed of annotating localrepo.py from 4.2 to 4.0
seconds.
Augie Fackler - May 24, 2016, 3:01 p.m.
On Mon, May 23, 2016 at 11:23:46AM -0500, Matt Mackall wrote:
> # HG changeset patch
> # User Matt Mackall <mpm@selenic.com>
> # Date 1463607452 18000
> #      Wed May 18 16:37:32 2016 -0500
> # Node ID 218a9c673a2cc278b1d53ba28eb0ec46b64d5b4e
> # Parent  8c8442523eefac2d53e3f10ff1ebf37f4d3c63c3
> annotate: optimize line counting

Queued, thanks.

>
> We used len(text.splitlines()) to count lines.

Did I just hear circus music in the distance? *sigh*

> This allocates, copies, and
> deallocates an object for every line in a file. Instead, we use
> count("\n") to count newlines and adjust based on whether there's a
> trailing newline.
>
> This improves the speed of annotating localrepo.py from 4.2 to 4.0
> seconds.
>
> diff -r 8c8442523eef -r 218a9c673a2c mercurial/context.py
> --- a/mercurial/context.py	Tue May 17 11:28:46 2016 -0500
> +++ b/mercurial/context.py	Wed May 18 16:37:32 2016 -0500
> @@ -930,16 +930,20 @@
>          this returns fixed value(False is used) as linenumber,
>          if "linenumber" parameter is "False".'''
>
> +        def lines(text):
> +            if text.endswith("\n"):
> +                return text.count("\n")
> +            return text.count("\n") + 1
> +
>          if linenumber is None:
>              def decorate(text, rev):
> -                return ([rev] * len(text.splitlines()), text)
> +                return ([rev] * lines(text), text)
>          elif linenumber:
>              def decorate(text, rev):
> -                size = len(text.splitlines())
> -                return ([(rev, i) for i in xrange(1, size + 1)], text)
> +                return ([(rev, i) for i in xrange(1, lines(text) + 1)], text)
>          else:
>              def decorate(text, rev):
> -                return ([(rev, False)] * len(text.splitlines()), text)
> +                return ([(rev, False)] * lines(text), text)
>
>          def pair(parent, child):
>              blocks = mdiff.allblocks(parent[1], child[1], opts=diffopts,
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Patch

diff -r 8c8442523eef -r 218a9c673a2c mercurial/context.py
--- a/mercurial/context.py	Tue May 17 11:28:46 2016 -0500
+++ b/mercurial/context.py	Wed May 18 16:37:32 2016 -0500
@@ -930,16 +930,20 @@ 
         this returns fixed value(False is used) as linenumber,
         if "linenumber" parameter is "False".'''
 
+        def lines(text):
+            if text.endswith("\n"):
+                return text.count("\n")
+            return text.count("\n") + 1
+
         if linenumber is None:
             def decorate(text, rev):
-                return ([rev] * len(text.splitlines()), text)
+                return ([rev] * lines(text), text)
         elif linenumber:
             def decorate(text, rev):
-                size = len(text.splitlines())
-                return ([(rev, i) for i in xrange(1, size + 1)], text)
+                return ([(rev, i) for i in xrange(1, lines(text) + 1)], text)
         else:
             def decorate(text, rev):
-                return ([(rev, False)] * len(text.splitlines()), text)
+                return ([(rev, False)] * lines(text), text)
 
         def pair(parent, child):
             blocks = mdiff.allblocks(parent[1], child[1], opts=diffopts,