Patchwork revset: use manifest.matches in _follow revset

login
register
mail settings
Submitter Durham Goode
Date Feb. 5, 2016, 8:47 p.m.
Message ID <4bf06bc950fcbee8fc65.1454705248@dev8486.prn1.facebook.com>
Download mbox | patch
Permalink /patch/13016/
State Superseded
Commit 86c4cbdaffeeb75b9f0298774ce6b3db8d5aa0aa
Headers show

Comments

Durham Goode - Feb. 5, 2016, 8:47 p.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1454705079 28800
#      Fri Feb 05 12:44:39 2016 -0800
# Node ID 4bf06bc950fcbee8fc65cd3fd17d366a27603ad9
# Parent  01a5143cd25f285f8c745a92986cd7186bb32c90
revset: use manifest.matches in _follow revset

The old _follow revset iterated over every file in the commit and checked if it
matched. For repos with large manifests, this could take 500ms. By switching to
use manifest.matches() we can take advantage of the fastpaths built in to
manifest.py that allows iterating over only the files in the matcher when it's a
simple matcher. This brings the time spent down from 500ms to 0ms during simple
operations like 'hg log -f file.txt'.
Sean Farley - Feb. 5, 2016, 9:18 p.m.
Durham Goode <durham@fb.com> writes:

> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1454705079 28800
> #      Fri Feb 05 12:44:39 2016 -0800
> # Node ID 4bf06bc950fcbee8fc65cd3fd17d366a27603ad9
> # Parent  01a5143cd25f285f8c745a92986cd7186bb32c90
> revset: use manifest.matches in _follow revset
>
> The old _follow revset iterated over every file in the commit and checked if it
> matched. For repos with large manifests, this could take 500ms. By switching to
> use manifest.matches() we can take advantage of the fastpaths built in to
> manifest.py that allows iterating over only the files in the matcher when it's a
> simple matcher. This brings the time spent down from 500ms to 0ms during simple
> operations like 'hg log -f file.txt'.

This looks great to me. Awesome speed up.
Martin von Zweigbergk - Feb. 5, 2016, 9:22 p.m.
On Fri, Feb 5, 2016 at 12:47 PM, Durham Goode <durham@fb.com> wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1454705079 28800
> #      Fri Feb 05 12:44:39 2016 -0800
> # Node ID 4bf06bc950fcbee8fc65cd3fd17d366a27603ad9
> # Parent  01a5143cd25f285f8c745a92986cd7186bb32c90
> revset: use manifest.matches in _follow revset
>
> The old _follow revset iterated over every file in the commit and checked if it
> matched. For repos with large manifests, this could take 500ms. By switching to
> use manifest.matches() we can take advantage of the fastpaths built in to
> manifest.py that allows iterating over only the files in the matcher when it's a
> simple matcher. This brings the time spent down from 500ms to 0ms during simple
> operations like 'hg log -f file.txt'.

Nice!

>
> diff --git a/mercurial/revset.py b/mercurial/revset.py
> --- a/mercurial/revset.py
> +++ b/mercurial/revset.py
> @@ -1086,13 +1086,14 @@ def _follow(repo, subset, x, name, follo
>          matcher = matchmod.match(repo.root, repo.getcwd(), [x],
>                                   ctx=repo[None], default='path')
>
> +        files = c.manifest().matches(matcher)

I think you can even use c.manifest.walk(matcher) here to avoid
creating the manifest. Obviously we don't need it to be faster than
0ms, but perhaps it would be noticeable when many files are matched.
I'll leave it to you to test or just dismiss that.

> +
>          s = set()
> -        for fname in c:
> -            if matcher(fname):
> -                fctx = c[fname]
> -                s = s.union(set(c.rev() for c in fctx.ancestors(followfirst)))
> -                # include the revision responsible for the most recent version
> -                s.add(fctx.introrev())
> +        for fname in files:
> +            fctx = c[fname]
> +            s = s.union(set(c.rev() for c in fctx.ancestors(followfirst)))
> +            # include the revision responsible for the most recent version
> +            s.add(fctx.introrev())
>      else:
>          s = _revancestors(repo, baseset([c.rev()]), followfirst)
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Durham Goode - Feb. 5, 2016, 9:27 p.m.
On 2/5/16 1:22 PM, Martin von Zweigbergk wrote:
> On Fri, Feb 5, 2016 at 12:47 PM, Durham Goode <durham@fb.com> wrote:
>> # HG changeset patch
>> # User Durham Goode <durham@fb.com>
>> # Date 1454705079 28800
>> #      Fri Feb 05 12:44:39 2016 -0800
>> # Node ID 4bf06bc950fcbee8fc65cd3fd17d366a27603ad9
>> # Parent  01a5143cd25f285f8c745a92986cd7186bb32c90
>> revset: use manifest.matches in _follow revset
>>
>> The old _follow revset iterated over every file in the commit and checked if it
>> matched. For repos with large manifests, this could take 500ms. By switching to
>> use manifest.matches() we can take advantage of the fastpaths built in to
>> manifest.py that allows iterating over only the files in the matcher when it's a
>> simple matcher. This brings the time spent down from 500ms to 0ms during simple
>> operations like 'hg log -f file.txt'.
> Nice!
>
>> diff --git a/mercurial/revset.py b/mercurial/revset.py
>> --- a/mercurial/revset.py
>> +++ b/mercurial/revset.py
>> @@ -1086,13 +1086,14 @@ def _follow(repo, subset, x, name, follo
>>           matcher = matchmod.match(repo.root, repo.getcwd(), [x],
>>                                    ctx=repo[None], default='path')
>>
>> +        files = c.manifest().matches(matcher)
> I think you can even use c.manifest.walk(matcher) here to avoid
> creating the manifest. Obviously we don't need it to be faster than
> 0ms, but perhaps it would be noticeable when many files are matched.
> I'll leave it to you to test or just dismiss that.
>
Interesting.  It's probably doesn't matter for this revset, but it's a 
good idea regardless.  I'll send a V2 with that.

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -1086,13 +1086,14 @@  def _follow(repo, subset, x, name, follo
         matcher = matchmod.match(repo.root, repo.getcwd(), [x],
                                  ctx=repo[None], default='path')
 
+        files = c.manifest().matches(matcher)
+
         s = set()
-        for fname in c:
-            if matcher(fname):
-                fctx = c[fname]
-                s = s.union(set(c.rev() for c in fctx.ancestors(followfirst)))
-                # include the revision responsible for the most recent version
-                s.add(fctx.introrev())
+        for fname in files:
+            fctx = c[fname]
+            s = s.union(set(c.rev() for c in fctx.ancestors(followfirst)))
+            # include the revision responsible for the most recent version
+            s.add(fctx.introrev())
     else:
         s = _revancestors(repo, baseset([c.rev()]), followfirst)