Patchwork [V2] revset: use manifest.matches in _follow revset

login
register
mail settings
Submitter Durham Goode
Date Feb. 5, 2016, 9:30 p.m.
Message ID <3e36d1273ba86c354729.1454707835@dev8486.prn1.facebook.com>
Download mbox | patch
Permalink /patch/13022/
State Accepted
Headers show

Comments

Durham Goode - Feb. 5, 2016, 9:30 p.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1454707825 28800
#      Fri Feb 05 13:30:25 2016 -0800
# Node ID 3e36d1273ba86c354729e1876a6585b8ab37a681
# Parent  01a5143cd25f285f8c745a92986cd7186bb32c90
revset: use manifest.matches in _follow revset

The old _follow revset iterated over every file in the commit and checked if it
matched. For repos with large manifests, this could take 500ms. By switching to
use manifest.matches() we can take advantage of the fastpaths built in to
manifest.py that allows iterating over only the files in the matcher when it's a
simple matcher. This brings the time spent down from 500ms to 0ms during simple
operations like 'hg log -f file.txt'.
Augie Fackler - Feb. 5, 2016, 9:47 p.m.
On Fri, Feb 05, 2016 at 01:30:35PM -0800, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1454707825 28800
> #      Fri Feb 05 13:30:25 2016 -0800
> # Node ID 3e36d1273ba86c354729e1876a6585b8ab37a681
> # Parent  01a5143cd25f285f8c745a92986cd7186bb32c90
> revset: use manifest.matches in _follow revset

Very nice. Queued.

>
> The old _follow revset iterated over every file in the commit and checked if it
> matched. For repos with large manifests, this could take 500ms. By switching to
> use manifest.matches() we can take advantage of the fastpaths built in to
> manifest.py that allows iterating over only the files in the matcher when it's a
> simple matcher. This brings the time spent down from 500ms to 0ms during simple
> operations like 'hg log -f file.txt'.
>
> diff --git a/mercurial/revset.py b/mercurial/revset.py
> --- a/mercurial/revset.py
> +++ b/mercurial/revset.py
> @@ -1086,13 +1086,14 @@ def _follow(repo, subset, x, name, follo
>          matcher = matchmod.match(repo.root, repo.getcwd(), [x],
>                                   ctx=repo[None], default='path')
>
> +        files = c.manifest().walk(matcher)
> +
>          s = set()
> -        for fname in c:
> -            if matcher(fname):
> -                fctx = c[fname]
> -                s = s.union(set(c.rev() for c in fctx.ancestors(followfirst)))
> -                # include the revision responsible for the most recent version
> -                s.add(fctx.introrev())
> +        for fname in files:
> +            fctx = c[fname]
> +            s = s.union(set(c.rev() for c in fctx.ancestors(followfirst)))
> +            # include the revision responsible for the most recent version
> +            s.add(fctx.introrev())
>      else:
>          s = _revancestors(repo, baseset([c.rev()]), followfirst)
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Sean Farley - Feb. 5, 2016, 9:48 p.m.
Durham Goode <durham@fb.com> writes:

> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1454707825 28800
> #      Fri Feb 05 13:30:25 2016 -0800
> # Node ID 3e36d1273ba86c354729e1876a6585b8ab37a681
> # Parent  01a5143cd25f285f8c745a92986cd7186bb32c90
> revset: use manifest.matches in _follow revset
>
> The old _follow revset iterated over every file in the commit and checked if it
> matched. For repos with large manifests, this could take 500ms. By switching to
> use manifest.matches() we can take advantage of the fastpaths built in to
> manifest.py that allows iterating over only the files in the matcher when it's a
> simple matcher. This brings the time spent down from 500ms to 0ms during simple
> operations like 'hg log -f file.txt'.
>
> diff --git a/mercurial/revset.py b/mercurial/revset.py
> --- a/mercurial/revset.py
> +++ b/mercurial/revset.py
> @@ -1086,13 +1086,14 @@ def _follow(repo, subset, x, name, follo
>          matcher = matchmod.match(repo.root, repo.getcwd(), [x],
>                                   ctx=repo[None], default='path')
>  
> +        files = c.manifest().walk(matcher)

Looks like this is the previous patch?
Sean Farley - Feb. 5, 2016, 9:50 p.m.
Sean Farley <sean@farley.io> writes:

> Durham Goode <durham@fb.com> writes:
>
>> # HG changeset patch
>> # User Durham Goode <durham@fb.com>
>> # Date 1454707825 28800
>> #      Fri Feb 05 13:30:25 2016 -0800
>> # Node ID 3e36d1273ba86c354729e1876a6585b8ab37a681
>> # Parent  01a5143cd25f285f8c745a92986cd7186bb32c90
>> revset: use manifest.matches in _follow revset
>>
>> The old _follow revset iterated over every file in the commit and checked if it
>> matched. For repos with large manifests, this could take 500ms. By switching to
>> use manifest.matches() we can take advantage of the fastpaths built in to
>> manifest.py that allows iterating over only the files in the matcher when it's a
>> simple matcher. This brings the time spent down from 500ms to 0ms during simple
>> operations like 'hg log -f file.txt'.
>>
>> diff --git a/mercurial/revset.py b/mercurial/revset.py
>> --- a/mercurial/revset.py
>> +++ b/mercurial/revset.py
>> @@ -1086,13 +1086,14 @@ def _follow(repo, subset, x, name, follo
>>          matcher = matchmod.match(repo.root, repo.getcwd(), [x],
>>                                   ctx=repo[None], default='path')
>>  
>> +        files = c.manifest().walk(matcher)
>
> Looks like this is the previous patch?

Nevermind. My mistake.

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -1086,13 +1086,14 @@  def _follow(repo, subset, x, name, follo
         matcher = matchmod.match(repo.root, repo.getcwd(), [x],
                                  ctx=repo[None], default='path')
 
+        files = c.manifest().walk(matcher)
+
         s = set()
-        for fname in c:
-            if matcher(fname):
-                fctx = c[fname]
-                s = s.union(set(c.rev() for c in fctx.ancestors(followfirst)))
-                # include the revision responsible for the most recent version
-                s.add(fctx.introrev())
+        for fname in files:
+            fctx = c[fname]
+            s = s.union(set(c.rev() for c in fctx.ancestors(followfirst)))
+            # include the revision responsible for the most recent version
+            s.add(fctx.introrev())
     else:
         s = _revancestors(repo, baseset([c.rev()]), followfirst)