Patchwork cat: increase perf when catting without patterns

login
register
mail settings
Submitter Durham Goode
Date Jan. 11, 2014, 3:17 a.m.
Message ID <c4d55205c7d81950d631.1389410250@dev010.prn1.facebook.com>
Download mbox | patch
Permalink /patch/3291/
State Superseded
Headers show

Comments

Durham Goode - Jan. 11, 2014, 3:17 a.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1389405865 28800
#      Fri Jan 10 18:04:25 2014 -0800
# Node ID c4d55205c7d81950d631debed3c044c82e87cbd9
# Parent  d2704c48f4176d8cd6f21d33500820d44763585c
cat: increase perf when catting without patterns

When running 'hg cat -r . <file>' it was doing an expensive ctx.walk(m). It's
much faster to just read that file directly. In our repo it shaves 60% off the
'hg cat' time.
Martin Geisler - Jan. 11, 2014, 9:10 p.m.
Durham Goode <durham@fb.com> writes:

> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1389405865 28800
> #      Fri Jan 10 18:04:25 2014 -0800
> # Node ID c4d55205c7d81950d631debed3c044c82e87cbd9
> # Parent  d2704c48f4176d8cd6f21d33500820d44763585c
> cat: increase perf when catting without patterns
>
> +    files = set(m.files())
> +    files.discard('.')
> +    if files and not m.anypats():
> +        for file in sorted(list(files)):

The sorted builtin works on sets too, so I don't think you need to turn
the set into a list first.

> +            try:
> +                write(file)
> +                err = 0
> +            except error.ManifestLookupError:
> +                # It's significantly cheaper to catch the exception than it
> +                # is to do a 'file in ctx' check, due to makefileobj using
> +                # manifest.find instead of manifest.read
> +                m.bad(file, _('no such file in rev %s') % ctx)
> +    else:
> +        for abs in ctx.walk(m):
> +            write(abs)
> +            err = 0

Would ctx.walk be able to have this "if files and no m.anypats()" logic
built-in? That is, could you push the if-statement into that method and
let more callers benefit?
Katsunori FUJIWARA - Jan. 12, 2014, 2:41 p.m.
At Fri, 10 Jan 2014 19:17:30 -0800,
Durham Goode wrote:
> 
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1389405865 28800
> #      Fri Jan 10 18:04:25 2014 -0800
> # Node ID c4d55205c7d81950d631debed3c044c82e87cbd9
> # Parent  d2704c48f4176d8cd6f21d33500820d44763585c
> cat: increase perf when catting without patterns
> 
> When running 'hg cat -r . <file>' it was doing an expensive ctx.walk(m). It's
> much faster to just read that file directly. In our repo it shaves 60% off the
> 'hg cat' time.
> 
> diff --git a/mercurial/commands.py b/mercurial/commands.py
> --- a/mercurial/commands.py
> +++ b/mercurial/commands.py
> @@ -1151,15 +1151,32 @@
>      ctx = scmutil.revsingle(repo, opts.get('rev'))
>      err = 1
>      m = scmutil.match(ctx, (file1,) + pats, opts)
> -    for abs in ctx.walk(m):
> +
> +    def write(path):
>          fp = cmdutil.makefileobj(repo, opts.get('output'), ctx.node(),
> -                                 pathname=abs)
> -        data = ctx[abs].data()
> +                                 pathname=path)
> +        data = ctx[path].data()
>          if opts.get('decode'):
> -            data = repo.wwritedata(abs, data)
> +            data = repo.wwritedata(path, data)
>          fp.write(data)
>          fp.close()
> -        err = 0
> +
> +    files = set(m.files())
> +    files.discard('.')
> +    if files and not m.anypats():
> +        for file in sorted(list(files)):
> +            try:
> +                write(file)
> +                err = 0

Before this patch, "hg cat" matches the pattern without explicit kind
(like "glob:") against not only file but also directory, because
default pattern kind of "scmutil.match()" is "relpath".

But this patch seems to match the pattern without explicit kind
against only file. This may break backward compatibility.


BTW, pattern matching of this patch seems to be similar to one of
"filelog()"/"contains()" revset predicates. I'm just working for the
patch to add explanation about "the pattern without explicit kind is
not matched against directory for efficiency" for online help of them :-)

> +            except error.ManifestLookupError:
> +                # It's significantly cheaper to catch the exception than it
> +                # is to do a 'file in ctx' check, due to makefileobj using
> +                # manifest.find instead of manifest.read
> +                m.bad(file, _('no such file in rev %s') % ctx)
> +    else:
> +        for abs in ctx.walk(m):
> +            write(abs)
> +            err = 0
>      return err
>  
>  @command('^clone',
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
> 

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Durham Goode - Jan. 14, 2014, 9:54 p.m.
On 1/12/14 6:41 AM, "FUJIWARA Katsunori" <foozy@lares.dti.ne.jp> wrote:

>At Fri, 10 Jan 2014 19:17:30 -0800,
>Durham Goode wrote:
>> 
>> # HG changeset patch
>> # User Durham Goode <durham@fb.com>
>> # Date 1389405865 28800
>> #      Fri Jan 10 18:04:25 2014 -0800
>> # Node ID c4d55205c7d81950d631debed3c044c82e87cbd9
>> # Parent  d2704c48f4176d8cd6f21d33500820d44763585c
>> cat: increase perf when catting without patterns
>> 
>>
>
>Before this patch, "hg cat" matches the pattern without explicit kind
>(like "glob:") against not only file but also directory, because
>default pattern kind of "scmutil.match()" is "relpath".
>
>But this patch seems to match the pattern without explicit kind
>against only file. This may break backward compatibility.
>
>
>BTW, pattern matching of this patch seems to be similar to one of
>"filelog()"/"contains()" revset predicates. I'm just working for the
>patch to add explanation about "the pattern without explicit kind is
>not matched against directory for efficiency" for online help of them :-)

I've sent a V2 that addresses this issue, and moves some of the benefit
into ctx.walk like Martin suggested.

Patch

diff --git a/mercurial/commands.py b/mercurial/commands.py
--- a/mercurial/commands.py
+++ b/mercurial/commands.py
@@ -1151,15 +1151,32 @@ 
     ctx = scmutil.revsingle(repo, opts.get('rev'))
     err = 1
     m = scmutil.match(ctx, (file1,) + pats, opts)
-    for abs in ctx.walk(m):
+
+    def write(path):
         fp = cmdutil.makefileobj(repo, opts.get('output'), ctx.node(),
-                                 pathname=abs)
-        data = ctx[abs].data()
+                                 pathname=path)
+        data = ctx[path].data()
         if opts.get('decode'):
-            data = repo.wwritedata(abs, data)
+            data = repo.wwritedata(path, data)
         fp.write(data)
         fp.close()
-        err = 0
+
+    files = set(m.files())
+    files.discard('.')
+    if files and not m.anypats():
+        for file in sorted(list(files)):
+            try:
+                write(file)
+                err = 0
+            except error.ManifestLookupError:
+                # It's significantly cheaper to catch the exception than it
+                # is to do a 'file in ctx' check, due to makefileobj using
+                # manifest.find instead of manifest.read
+                m.bad(file, _('no such file in rev %s') % ctx)
+    else:
+        for abs in ctx.walk(m):
+            write(abs)
+            err = 0
     return err
 
 @command('^clone',