Patchwork [1,of,2,V2] changectx: increase perf of walk function

login
register
mail settings
Submitter Durham Goode
Date Jan. 14, 2014, 9:52 p.m.
Message ID <9c531e73a03eb7ddd809.1389736350@dev010.prn1.facebook.com>
Download mbox | patch
Permalink /patch/3322/
State Accepted
Commit 8dc254198a8f7ec51e459d3804fe0de91a17979a
Headers show

Comments

Durham Goode - Jan. 14, 2014, 9:52 p.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1389736159 28800
#      Tue Jan 14 13:49:19 2014 -0800
# Node ID 9c531e73a03eb7ddd809da0778d98612ae41134e
# Parent  47d0843647d1e32f6af4482867327cec5db11a1f
changectx: increase perf of walk function

When running 'hg cat -r . <file>' it was doing an expensive ctx.walk(m) which
applied the regex to every file in the manifest.

This changes changectx.walk to iterate over just the files in the regex, if no
other patterns are specified. This cuts hg cat time by 50% in our repo and
probably benefits a few other commands as well.
Siddharth Agarwal - Jan. 15, 2014, 1:35 a.m.
On 01/14/2014 01:52 PM, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1389736159 28800
> #      Tue Jan 14 13:49:19 2014 -0800
> # Node ID 9c531e73a03eb7ddd809da0778d98612ae41134e
> # Parent  47d0843647d1e32f6af4482867327cec5db11a1f
> changectx: increase perf of walk function
>
> When running 'hg cat -r . <file>' it was doing an expensive ctx.walk(m) which
> applied the regex to every file in the manifest.
>
> This changes changectx.walk to iterate over just the files in the regex, if no
> other patterns are specified. This cuts hg cat time by 50% in our repo and
> probably benefits a few other commands as well.
>
> diff --git a/mercurial/context.py b/mercurial/context.py
> --- a/mercurial/context.py
> +++ b/mercurial/context.py
> @@ -410,6 +410,15 @@
>           # for dirstate.walk, files=['.'] means "walk the whole tree".
>           # follow that here, too
>           fset.discard('.')
> +
> +        # avoid the entire walk if we're only looking for specific files
> +        if fset and not match.anypats():
> +            if util.all([fn in self for fn in fset]):
> +                for fn in sorted(fset):
> +                    if match(fn):
> +                        yield fn

I think you don't need 'if match(fn)': since fset is derived from 
match.files(), you should just be able to yield fn.
Durham Goode - Jan. 15, 2014, 2 a.m.
On 1/14/14 5:35 PM, "Siddharth Agarwal" <sid0@fb.com> wrote:

>On 01/14/2014 01:52 PM, Durham Goode wrote:
>> # HG changeset patch
>> # User Durham Goode <durham@fb.com>
>> # Date 1389736159 28800
>> #      Tue Jan 14 13:49:19 2014 -0800
>> # Node ID 9c531e73a03eb7ddd809da0778d98612ae41134e
>> # Parent  47d0843647d1e32f6af4482867327cec5db11a1f
>> changectx: increase perf of walk function
>>
>> When running 'hg cat -r . <file>' it was doing an expensive ctx.walk(m)
>>which
>> applied the regex to every file in the manifest.
>>
>> This changes changectx.walk to iterate over just the files in the
>>regex, if no
>> other patterns are specified. This cuts hg cat time by 50% in our repo
>>and
>> probably benefits a few other commands as well.
>>
>> diff --git a/mercurial/context.py b/mercurial/context.py
>> --- a/mercurial/context.py
>> +++ b/mercurial/context.py
>> @@ -410,6 +410,15 @@
>>           # for dirstate.walk, files=['.'] means "walk the whole tree".
>>           # follow that here, too
>>           fset.discard('.')
>> +
>> +        # avoid the entire walk if we're only looking for specific
>>files
>> +        if fset and not match.anypats():
>> +            if util.all([fn in self for fn in fset]):
>> +                for fn in sorted(fset):
>> +                    if match(fn):
>> +                        yield fn
>
>I think you don't need 'if match(fn)': since fset is derived from
>match.files(), you should just be able to yield fn.
>

I originally did that, but largefiles abuses match by setting it's own
function to match.matchfn and building some state for every matched file
name. So I still need to call match on hits.
Siddharth Agarwal - Jan. 15, 2014, 7:07 a.m.
On 01/14/2014 06:00 PM, Durham Goode wrote:
> I originally did that, but largefiles abuses match by setting it's own
> function to match.matchfn and building some state for every matched file
> name. So I still need to call match on hits.

In that case, does that need to be accounted for in patch 2 as well?

(Grr, largefiles...)
Mads Kiilerich - Jan. 15, 2014, 2:42 p.m.
On 01/15/2014 08:07 AM, Siddharth Agarwal wrote:
> On 01/14/2014 06:00 PM, Durham Goode wrote:
>> I originally did that, but largefiles abuses match by setting it's own
>> function to match.matchfn and building some state for every matched file
>> name. So I still need to call match on hits.
>
> In that case, does that need to be accounted for in patch 2 as well?
>
> (Grr, largefiles...)

Indeed. Especially with the .hglf prefix and in combination with the 
different handling of matching and unmatching patterns in different places.

The least insane way to improve that seems to be to get rid of the .hglf 
prefix - especially in the layers where matchers are used.

/Mads
Durham Goode - Jan. 15, 2014, 6:51 p.m.
On 1/14/14 11:07 PM, "Siddharth Agarwal" <sid0@fb.com> wrote:

>On 01/14/2014 06:00 PM, Durham Goode wrote:
>> I originally did that, but largefiles abuses match by setting it's own
>> function to match.matchfn and building some state for every matched file
>> name. So I still need to call match on hits.
>
>In that case, does that need to be accounted for in patch 2 as well?
>
>(Grr, largefiles...)
>

It's not necessary in patch 2 because largefiles does not call the
original cat function.  Even if it did, the match object is not passed in,
so an extension has no way of messing the functions on the match object
that's inside the original cat function.
Durham Goode - Jan. 16, 2014, 7:45 p.m.
On 1/15/14 10:51 AM, "Durham Goode" <durham@fb.com> wrote:

>On 1/14/14 11:07 PM, "Siddharth Agarwal" <sid0@fb.com> wrote:
>
>>On 01/14/2014 06:00 PM, Durham Goode wrote:
>>> I originally did that, but largefiles abuses match by setting it's own
>>> function to match.matchfn and building some state for every matched
>>>file
>>> name. So I still need to call match on hits.
>>
>>In that case, does that need to be accounted for in patch 2 as well?
>>
>>(Grr, largefiles...)
>>
>
>It's not necessary in patch 2 because largefiles does not call the
>original cat function.  Even if it did, the match object is not passed in,
>so an extension has no way of messing the functions on the match object
>that's inside the original cat function.

If no one else has issues, this patch series is ready for queueing.
Siddharth Agarwal - Jan. 24, 2014, 8:10 a.m.
On 01/14/2014 06:00 PM, Durham Goode wrote:
> I originally did that, but largefiles abuses match by setting it's own
> function to match.matchfn and building some state for every matched file
> name. So I still need to call match on hits.

You'll be happy to know that my in-progress revert rewrite's having the 
same issue: it no longer calls the match function for each file being 
reverted in the explicit case, so largefiles doesn't know which files to 
revert. :(

Patch

diff --git a/mercurial/context.py b/mercurial/context.py
--- a/mercurial/context.py
+++ b/mercurial/context.py
@@ -410,6 +410,15 @@ 
         # for dirstate.walk, files=['.'] means "walk the whole tree".
         # follow that here, too
         fset.discard('.')
+
+        # avoid the entire walk if we're only looking for specific files
+        if fset and not match.anypats():
+            if util.all([fn in self for fn in fset]):
+                for fn in sorted(fset):
+                    if match(fn):
+                        yield fn
+                raise StopIteration
+
         for fn in self:
             if fn in fset:
                 # specified pattern is the exact name