Patchwork [3,of,4] largefiles: shortcircuit status code also for non-matching patterns

login
register
mail settings
Submitter Martin von Zweigbergk
Date Nov. 2, 2014, 9:43 p.m.
Message ID <b89ddc59955d4ffc106c.1414964595@handduk2.mtv.corp.google.com>
Download mbox | patch
Permalink /patch/6538/
State Accepted
Headers show

Comments

Martin von Zweigbergk - Nov. 2, 2014, 9:43 p.m.
# HG changeset patch
# User Martin von Zweigbergk <martinvonz@google.com>
# Date 1414789907 25200
#      Fri Oct 31 14:11:47 2014 -0700
# Branch stable
# Node ID b89ddc59955d4ffc106c8566af52111f7d52f718
# Parent  306755539e7e90d6ab8d5dccd68fc8b07042636a
largefiles: shortcircuit status code also for non-matching patterns

We currently shortcircuit the checking for large file standins if only
patterns of type 'path' are given on the command line. That makes e.g.
"hg st 'glob:foo/**'" unnecessarily slow when the only large files are
in a sibling directory.

Relax the check to be that it is not an always-matcher and that no
large files match the patterns given on the command line.

Note that before this change, only the latter of the following two
would show the status of files in .hglf (since the -I makes
match.anypats() true). After this change, they both display the
status. This behavior doesn't seem correct, but it would be a separate
change to explicitly filter out .hglf even in the shortcircuit case.

  hg st .hglf/$file
  hg st .hglf/$file -I .
Mads Kiilerich - Nov. 3, 2014, 1:18 a.m.
On 11/02/2014 10:43 PM, Martin von Zweigbergk wrote:
> # HG changeset patch
> # User Martin von Zweigbergk <martinvonz@google.com>
> # Date 1414789907 25200
> #      Fri Oct 31 14:11:47 2014 -0700
> # Branch stable
> # Node ID b89ddc59955d4ffc106c8566af52111f7d52f718
> # Parent  306755539e7e90d6ab8d5dccd68fc8b07042636a
> largefiles: shortcircuit status code also for non-matching patterns
>
> We currently shortcircuit the checking for large file standins if only
> patterns of type 'path' are given on the command line. That makes e.g.
> "hg st 'glob:foo/**'" unnecessarily slow when the only large files are
> in a sibling directory.
>
> Relax the check to be that it is not an always-matcher and that no
> large files match the patterns given on the command line.
>
> Note that before this change, only the latter of the following two
> would show the status of files in .hglf (since the -I makes
> match.anypats() true). After this change, they both display the
> status. This behavior doesn't seem correct, but it would be a separate
> change to explicitly filter out .hglf even in the shortcircuit case.
>
>    hg st .hglf/$file
>    hg st .hglf/$file -I .
> diff --git a/hgext/largefiles/reposetup.py b/hgext/largefiles/reposetup.py
> --- a/hgext/largefiles/reposetup.py
> +++ b/hgext/largefiles/reposetup.py
> @@ -102,12 +102,12 @@
>                   except error.LockError:
>                       pass
>   
> -                # First check if there were files specified on the
> -                # command line.  If there were, and none of them were
> +                # First check if paths or patterns were specified on the
> +                # command line.  If there were, and they don't match any
>                   # largefiles, we should just bail here and let super
>                   # handle it -- thus gaining a big performance boost.
>                   lfdirstate = lfutil.openlfdirstate(ui, self)
> -                if match.files() and not match.anypats():
> +                if not match.always():
>                       for f in lfdirstate:
>                           if match(f):
>                               break

This all seems very reasonable ... and if not, we will find out and get 
a good test case.

But please, if this make us able to handle more cases, add test cases 
for them.

/Mads
Martin von Zweigbergk - Nov. 3, 2014, 5:12 a.m.
On Sun Nov 02 2014 at 5:18:24 PM Mads Kiilerich <mads@kiilerich.com> wrote:

> On 11/02/2014 10:43 PM, Martin von Zweigbergk wrote:
> > # HG changeset patch
> > # User Martin von Zweigbergk <martinvonz@google.com>
> > # Date 1414789907 25200
> > #      Fri Oct 31 14:11:47 2014 -0700
> > # Branch stable
> > # Node ID b89ddc59955d4ffc106c8566af52111f7d52f718
> > # Parent  306755539e7e90d6ab8d5dccd68fc8b07042636a
> > largefiles: shortcircuit status code also for non-matching patterns
> >
> > We currently shortcircuit the checking for large file standins if only
> > patterns of type 'path' are given on the command line. That makes e.g.
> > "hg st 'glob:foo/**'" unnecessarily slow when the only large files are
> > in a sibling directory.
> >
> > Relax the check to be that it is not an always-matcher and that no
> > large files match the patterns given on the command line.
> >
> > Note that before this change, only the latter of the following two
> > would show the status of files in .hglf (since the -I makes
> > match.anypats() true). After this change, they both display the
> > status. This behavior doesn't seem correct, but it would be a separate
> > change to explicitly filter out .hglf even in the shortcircuit case.
> >
> >    hg st .hglf/$file
> >    hg st .hglf/$file -I .
> > diff --git a/hgext/largefiles/reposetup.py b/hgext/largefiles/reposetup.
> py
> > --- a/hgext/largefiles/reposetup.py
> > +++ b/hgext/largefiles/reposetup.py
> > @@ -102,12 +102,12 @@
> >                   except error.LockError:
> >                       pass
> >
> > -                # First check if there were files specified on the
> > -                # command line.  If there were, and none of them were
> > +                # First check if paths or patterns were specified on the
> > +                # command line.  If there were, and they don't match any
> >                   # largefiles, we should just bail here and let super
> >                   # handle it -- thus gaining a big performance boost.
> >                   lfdirstate = lfutil.openlfdirstate(ui, self)
> > -                if match.files() and not match.anypats():
> > +                if not match.always():
> >                       for f in lfdirstate:
> >                           if match(f):
> >                               break
>
> This all seems very reasonable ... and if not, we will find out and get
> a good test case.
>
> But please, if this make us able to handle more cases, add test cases
> for them.
>

I don't think it should change behavior (except as noted in the commit
message); it's just an optimization. I should probably have included
timings, though. It speeds up "hg st 'glob:browser/**'" on the Mozilla repo
from 1.036s to 0.249s when there are no large files in that directory.

Patch

diff --git a/hgext/largefiles/reposetup.py b/hgext/largefiles/reposetup.py
--- a/hgext/largefiles/reposetup.py
+++ b/hgext/largefiles/reposetup.py
@@ -102,12 +102,12 @@ 
                 except error.LockError:
                     pass
 
-                # First check if there were files specified on the
-                # command line.  If there were, and none of them were
+                # First check if paths or patterns were specified on the
+                # command line.  If there were, and they don't match any
                 # largefiles, we should just bail here and let super
                 # handle it -- thus gaining a big performance boost.
                 lfdirstate = lfutil.openlfdirstate(ui, self)
-                if match.files() and not match.anypats():
+                if not match.always():
                     for f in lfdirstate:
                         if match(f):
                             break