Patchwork [2,of,2] lfs: migrate most file filtering from threshold to custom filter

login
register
mail settings
Submitter Matt Harbison
Date Jan. 13, 2018, 7:23 a.m.
Message ID <66976f55793cced57929.1515828196@Envy>
Download mbox | patch
Permalink /patch/26725/
State Accepted
Headers show

Comments

Matt Harbison - Jan. 13, 2018, 7:23 a.m.
# HG changeset patch
# User Matt Harbison <matt_harbison@yahoo.com>
# Date 1514706889 18000
#      Sun Dec 31 02:54:49 2017 -0500
# Node ID 66976f55793cced57929dedc8993204be340c717
# Parent  868cc63bfe9d7d7f5b40bc8cd70175cf1a608a95
lfs: migrate most file filtering from threshold to custom filter

Migrate `lfs.threshold` to more powerful `lfs.filter` added by D4990618 so
people can specify what files to be stored in LFS with more flexibility.

This patch was authored by Jun Wu for the fb-experimental repo, to avoid using
matcher for efficiency[1].  All I've changed here is to register the new
'lfs.track' default so that the tests run cleanly, and adapt the subsequent
language changes.  Migrating the remaining uses of 'lfs.threshold' can be done
separately since there's a fallback in place.

[1] https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-December/109388.html
Yuya Nishihara - Jan. 14, 2018, 2:11 a.m.
On Sat, 13 Jan 2018 02:23:16 -0500, Matt Harbison wrote:
> # HG changeset patch
> # User Matt Harbison <matt_harbison@yahoo.com>
> # Date 1514706889 18000
> #      Sun Dec 31 02:54:49 2017 -0500
> # Node ID 66976f55793cced57929dedc8993204be340c717
> # Parent  868cc63bfe9d7d7f5b40bc8cd70175cf1a608a95
> lfs: migrate most file filtering from threshold to custom filter

Queued, thanks.

> -    threshold = repo.ui.configbytes('lfs', 'threshold')
> +    trackspec = repo.ui.config('lfs', 'track')
>  
> -    repo.svfs.options['lfsthreshold'] = threshold
> +    # deprecated config: lfs.threshold
> +    threshold = repo.ui.configbytes('lfs', 'threshold')
> +    if threshold:
> +        trackspec = "(%s) | size('>%s')" % (trackspec, threshold)

I've added fileset.parse(trackspec) to detect bad spec like '..) .. (..', and
s/%s/%d/ to make sure threshold is integer so no need of string escape.
Matt Harbison - Jan. 14, 2018, 5:22 a.m.
On Sat, 13 Jan 2018 21:11:45 -0500, Yuya Nishihara <yuya@tcha.org> wrote:

> On Sat, 13 Jan 2018 02:23:16 -0500, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison <matt_harbison@yahoo.com>
>> # Date 1514706889 18000
>> #      Sun Dec 31 02:54:49 2017 -0500
>> # Node ID 66976f55793cced57929dedc8993204be340c717
>> # Parent  868cc63bfe9d7d7f5b40bc8cd70175cf1a608a95
>> lfs: migrate most file filtering from threshold to custom filter
>
> Queued, thanks.
>
>> -    threshold = repo.ui.configbytes('lfs', 'threshold')
>> +    trackspec = repo.ui.config('lfs', 'track')
>>
>> -    repo.svfs.options['lfsthreshold'] = threshold
>> +    # deprecated config: lfs.threshold
>> +    threshold = repo.ui.configbytes('lfs', 'threshold')
>> +    if threshold:
>> +        trackspec = "(%s) | size('>%s')" % (trackspec, threshold)
>
> I've added fileset.parse(trackspec) to detect bad spec like '..) ..  
> (..', and
> s/%s/%d/ to make sure threshold is integer so no need of string escape.

Thanks.

I assume that your amend obsoleted something local only to your system?   
The reason I ask is because I ran `hg extdiff --patch --hidden -r  
'precursors(c780e06)' -r c780e06`, and it aborted with an empty revision.   
It's the precursors() predicate coming up empty.  I ended up using  
'last(allprecursors(c780e06))', and that found what I submitted.  But that  
seems inconsistent with the obsolete line of that logged revision pointing  
directly to c780e06.

At minimum, it would be nice to have a shorter way to spell that.  But it  
seems weird that the log predicate would care about revisions not in the  
repo.  Boris- it looks like '{precursors}' will skip the obsolete revision  
not on my system, and agrees with 'last(..)'.
Yuya Nishihara - Jan. 14, 2018, 8:26 a.m.
On Sun, 14 Jan 2018 00:22:05 -0500, Matt Harbison wrote:
> On Sat, 13 Jan 2018 21:11:45 -0500, Yuya Nishihara <yuya@tcha.org> wrote:
> 
> > On Sat, 13 Jan 2018 02:23:16 -0500, Matt Harbison wrote:
> >> # HG changeset patch
> >> # User Matt Harbison <matt_harbison@yahoo.com>
> >> # Date 1514706889 18000
> >> #      Sun Dec 31 02:54:49 2017 -0500
> >> # Node ID 66976f55793cced57929dedc8993204be340c717
> >> # Parent  868cc63bfe9d7d7f5b40bc8cd70175cf1a608a95
> >> lfs: migrate most file filtering from threshold to custom filter
> >
> > Queued, thanks.
> >
> >> -    threshold = repo.ui.configbytes('lfs', 'threshold')
> >> +    trackspec = repo.ui.config('lfs', 'track')
> >>
> >> -    repo.svfs.options['lfsthreshold'] = threshold
> >> +    # deprecated config: lfs.threshold
> >> +    threshold = repo.ui.configbytes('lfs', 'threshold')
> >> +    if threshold:
> >> +        trackspec = "(%s) | size('>%s')" % (trackspec, threshold)
> >
> > I've added fileset.parse(trackspec) to detect bad spec like '..) ..  
> > (..', and
> > s/%s/%d/ to make sure threshold is integer so no need of string escape.
> 
> Thanks.
> 
> I assume that your amend obsoleted something local only to your system?   
> The reason I ask is because I ran `hg extdiff --patch --hidden -r  
> 'precursors(c780e06)' -r c780e06`, and it aborted with an empty revision.   
> It's the precursors() predicate coming up empty.  I ended up using  
> 'last(allprecursors(c780e06))', and that found what I submitted.  But that  
> seems inconsistent with the obsolete line of that logged revision pointing  
> directly to c780e06.

What I generally do is:

 hg import --obsolete mails  # yours->mine
 hg amend                    # mine->committed

> At minimum, it would be nice to have a shorter way to spell that.

I think allprecursors() should be renamed to precursors(), and delete/rename
the original precursors().

> But it
> seems weird that the log predicate would care about revisions not in the  
> repo.  Boris- it looks like '{precursors}' will skip the obsolete revision  
> not on my system, and agrees with 'last(..)'.
>
Boris Feld - Jan. 16, 2018, 10:26 a.m.
On Sun, 2018-01-14 at 17:26 +0900, Yuya Nishihara wrote:
> On Sun, 14 Jan 2018 00:22:05 -0500, Matt Harbison wrote:
> > On Sat, 13 Jan 2018 21:11:45 -0500, Yuya Nishihara <yuya@tcha.org>
> > wrote:
> > 
> > > On Sat, 13 Jan 2018 02:23:16 -0500, Matt Harbison wrote:
> > > > # HG changeset patch
> > > > # User Matt Harbison <matt_harbison@yahoo.com>
> > > > # Date 1514706889 18000
> > > > #      Sun Dec 31 02:54:49 2017 -0500
> > > > # Node ID 66976f55793cced57929dedc8993204be340c717
> > > > # Parent  868cc63bfe9d7d7f5b40bc8cd70175cf1a608a95
> > > > lfs: migrate most file filtering from threshold to custom
> > > > filter
> > > 
> > > Queued, thanks.
> > > 
> > > > -    threshold = repo.ui.configbytes('lfs', 'threshold')
> > > > +    trackspec = repo.ui.config('lfs', 'track')
> > > > 
> > > > -    repo.svfs.options['lfsthreshold'] = threshold
> > > > +    # deprecated config: lfs.threshold
> > > > +    threshold = repo.ui.configbytes('lfs', 'threshold')
> > > > +    if threshold:
> > > > +        trackspec = "(%s) | size('>%s')" % (trackspec,
> > > > threshold)
> > > 
> > > I've added fileset.parse(trackspec) to detect bad spec like '..)
> > > ..  
> > > (..', and
> > > s/%s/%d/ to make sure threshold is integer so no need of string
> > > escape.
> > 
> > Thanks.
> > 
> > I assume that your amend obsoleted something local only to your
> > system?   
> > The reason I ask is because I ran `hg extdiff --patch --hidden -r  
> > 'precursors(c780e06)' -r c780e06`, and it aborted with an empty
> > revision.   
> > It's the precursors() predicate coming up empty.  I ended up
> > using  
> > 'last(allprecursors(c780e06))', and that found what I
> > submitted.  But that  
> > seems inconsistent with the obsolete line of that logged revision
> > pointing  
> > directly to c780e06.
> 
> What I generally do is:
> 
>  hg import --obsolete mails  # yours->mine
>  hg amend                    # mine->committed
> 
> > At minimum, it would be nice to have a shorter way to spell that.
> 
> I think allprecursors() should be renamed to precursors(), and
> delete/rename
> the original precursors().
> 
> > But it
> > seems weird that the log predicate would care about revisions not
> > in the  
> > repo.  Boris- it looks like '{precursors}' will skip the obsolete
> > revision  
> > not on my system, and agrees with 'last(..)'.
> >  

I checked the precursors revset and the behavior is not the one
expected, thank you for pointing that.


I also realized that the revset is in Evolve and not in Core, which
makes it a good opportunity to upstream it.

I'm not sure if we should keep only the equivalent of allprecursors(),
what about having a predecessor() and allpredecessors() revsets.

predecessors() would returns the closests locally known predecessors,
like the {predecessors} template keyword while allpredecessors() would
returns all the locally known predecessors.

What do you think?
Matt Harbison - Jan. 16, 2018, 1:05 p.m.
> On Jan 16, 2018, at 5:26 AM, Boris Feld <boris.feld@octobus.net> wrote:
> 
>> On Sun, 2018-01-14 at 17:26 +0900, Yuya Nishihara wrote:
>>> On Sun, 14 Jan 2018 00:22:05 -0500, Matt Harbison wrote:
>>> On Sat, 13 Jan 2018 21:11:45 -0500, Yuya Nishihara <yuya@tcha.org>
>>> wrote:
>>> 
>>>>> On Sat, 13 Jan 2018 02:23:16 -0500, Matt Harbison wrote:
>>>>> # HG changeset patch
>>>>> # User Matt Harbison <matt_harbison@yahoo.com>
>>>>> # Date 1514706889 18000
>>>>> #      Sun Dec 31 02:54:49 2017 -0500
>>>>> # Node ID 66976f55793cced57929dedc8993204be340c717
>>>>> # Parent  868cc63bfe9d7d7f5b40bc8cd70175cf1a608a95
>>>>> lfs: migrate most file filtering from threshold to custom
>>>>> filter
>>>> 
>>>> Queued, thanks.
>>>> 
>>>>> -    threshold = repo.ui.configbytes('lfs', 'threshold')
>>>>> +    trackspec = repo.ui.config('lfs', 'track')
>>>>> 
>>>>> -    repo.svfs.options['lfsthreshold'] = threshold
>>>>> +    # deprecated config: lfs.threshold
>>>>> +    threshold = repo.ui.configbytes('lfs', 'threshold')
>>>>> +    if threshold:
>>>>> +        trackspec = "(%s) | size('>%s')" % (trackspec,
>>>>> threshold)
>>>> 
>>>> I've added fileset.parse(trackspec) to detect bad spec like '..)
>>>> ..  
>>>> (..', and
>>>> s/%s/%d/ to make sure threshold is integer so no need of string
>>>> escape.
>>> 
>>> Thanks.
>>> 
>>> I assume that your amend obsoleted something local only to your
>>> system?   
>>> The reason I ask is because I ran `hg extdiff --patch --hidden -r  
>>> 'precursors(c780e06)' -r c780e06`, and it aborted with an empty
>>> revision.   
>>> It's the precursors() predicate coming up empty.  I ended up
>>> using  
>>> 'last(allprecursors(c780e06))', and that found what I
>>> submitted.  But that  
>>> seems inconsistent with the obsolete line of that logged revision
>>> pointing  
>>> directly to c780e06.
>> 
>> What I generally do is:
>> 
>> hg import --obsolete mails  # yours->mine
>> hg amend                    # mine->committed
>> 
>>> At minimum, it would be nice to have a shorter way to spell that.
>> 
>> I think allprecursors() should be renamed to precursors(), and
>> delete/rename
>> the original precursors().
>> 
>>> But it
>>> seems weird that the log predicate would care about revisions not
>>> in the  
>>> repo.  Boris- it looks like '{precursors}' will skip the obsolete
>>> revision  
>>> not on my system, and agrees with 'last(..)'.
>>> 
> 
> I checked the precursors revset and the behavior is not the one
> expected, thank you for pointing that.
> 
> 
> I also realized that the revset is in Evolve and not in Core, which
> makes it a good opportunity to upstream it.
> 
> I'm not sure if we should keep only the equivalent of allprecursors(),
> what about having a predecessor() and allpredecessors() revsets.
> 
> predecessors() would returns the closests locally known predecessors,
> like the {predecessors} template keyword while allpredecessors() would
> returns all the locally known predecessors.
> 
> What do you think?

I think a dedicated revset to find the first generation, for lack of a better word, of predecessors is useful.  I use it all the time to ensure rebase/evolve conflict resolution didn’t drop anything, though it only works if it wasn’t split or folded.

There was a proposal for short handing something like this, so maybe a dedicated predicate won’t be needed in the future.  But it would be nice if the template and revset agreed, and I don’t see an “all” template as being as useful as the current one.

http://mercurial.markmail.org/thread/sjnnwa43s4eksu62
Boris Feld - Jan. 16, 2018, 1:20 p.m.
On Tue, 2018-01-16 at 08:05 -0500, Matt Harbison wrote:
> On Jan 16, 2018, at 5:26 AM, Boris Feld <boris.feld@octobus.net>
> wrote:
> 
> > On Sun, 2018-01-14 at 17:26 +0900, Yuya Nishihara wrote:
> > > On Sun, 14 Jan 2018 00:22:05 -0500, Matt Harbison wrote:
> > > > On Sat, 13 Jan 2018 21:11:45 -0500, Yuya Nishihara <yuya@tcha.o
> > > > rg>
> > > > wrote:
> > > > 
> > > > > On Sat, 13 Jan 2018 02:23:16 -0500, Matt Harbison wrote:
> > > > > > # HG changeset patch
> > > > > > # User Matt Harbison <matt_harbison@yahoo.com>
> > > > > > # Date 1514706889 18000
> > > > > > #      Sun Dec 31 02:54:49 2017 -0500
> > > > > > # Node ID 66976f55793cced57929dedc8993204be340c717
> > > > > > # Parent  868cc63bfe9d7d7f5b40bc8cd70175cf1a608a95
> > > > > > lfs: migrate most file filtering from threshold to custom
> > > > > > filter
> > > > > 
> > > > > Queued, thanks.
> > > > > 
> > > > > > -    threshold = repo.ui.configbytes('lfs', 'threshold')
> > > > > > +    trackspec = repo.ui.config('lfs', 'track')
> > > > > > 
> > > > > > -    repo.svfs.options['lfsthreshold'] = threshold
> > > > > > +    # deprecated config: lfs.threshold
> > > > > > +    threshold = repo.ui.configbytes('lfs', 'threshold')
> > > > > > +    if threshold:
> > > > > > +        trackspec = "(%s) | size('>%s')" % (trackspec,
> > > > > > threshold)
> > > > > 
> > > > > I've added fileset.parse(trackspec) to detect bad spec like
> > > > > '..)
> > > > > ..  
> > > > > (..', and
> > > > > s/%s/%d/ to make sure threshold is integer so no need of
> > > > > string
> > > > > escape.
> > > > 
> > > > Thanks.
> > > > 
> > > > I assume that your amend obsoleted something local only to your
> > > > system?   
> > > > The reason I ask is because I ran `hg extdiff --patch --hidden
> > > > -r  
> > > > 'precursors(c780e06)' -r c780e06`, and it aborted with an empty
> > > > revision.   
> > > > It's the precursors() predicate coming up empty.  I ended up
> > > > using  
> > > > 'last(allprecursors(c780e06))', and that found what I
> > > > submitted.  But that  
> > > > seems inconsistent with the obsolete line of that logged
> > > > revision
> > > > pointing  
> > > > directly to c780e06.
> > > 
> > > What I generally do is:
> > > 
> > >  hg import --obsolete mails  # yours->mine
> > >  hg amend                    # mine->committed
> > > 
> > > > At minimum, it would be nice to have a shorter way to spell
> > > > that.
> > > 
> > > I think allprecursors() should be renamed to precursors(), and
> > > delete/rename
> > > the original precursors().
> > > 
> > > > But it
> > > > seems weird that the log predicate would care about revisions
> > > > not
> > > > in the  
> > > > repo.  Boris- it looks like '{precursors}' will skip the
> > > > obsolete
> > > > revision  
> > > > not on my system, and agrees with 'last(..)'.
> > > > 
> > 
> > I checked the precursors revset and the behavior is not the one
> > expected, thank you for pointing that.
> > 
> > 
> > I also realized that the revset is in Evolve and not in Core, which
> > makes it a good opportunity to upstream it.
> > 
> > I'm not sure if we should keep only the equivalent of
> > allprecursors(),
> > what about having a predecessor() and allpredecessors() revsets.
> > 
> > predecessors() would returns the closests locally known
> > predecessors,
> > like the {predecessors} template keyword while allpredecessors()
> > would
> > returns all the locally known predecessors.
> > 
> > What do you think?
> 
> I think a dedicated revset to find the first generation, for lack of
> a better word, of predecessors is useful.  I use it all the time to
> ensure rebase/evolve conflict resolution didn’t drop anything, though
> it only works if it wasn’t split or folded.

I think the term we used is closest predecessor, at least we have a
function named closestpredecessors in obsutil.py.

> There was a proposal for short handing something like this, so maybe
> a dedicated predicate won’t be needed in the future.  But it would be
> nice if the template and revset agreed, and I don’t see an “all”
> template as being as useful as the current one.
> 
> 
> http://mercurial.markmail.org/thread/sjnnwa43s4eksu62
> 

I was not aware of this proposal. I think a dedicated predicate would
be usefull too to have more verbose and more readable templates.

When you are speaking about an "all" tempalte, do you means that you
don't think an "allpredecessors" template would be usefull in core?
Yuya Nishihara - Jan. 16, 2018, 2:20 p.m.
On Tue, 16 Jan 2018 14:20:32 +0100, Boris Feld wrote:
> On Tue, 2018-01-16 at 08:05 -0500, Matt Harbison wrote:
> > On Jan 16, 2018, at 5:26 AM, Boris Feld <boris.feld@octobus.net>
> > wrote:
> > > I also realized that the revset is in Evolve and not in Core, which
> > > makes it a good opportunity to upstream it.
> > > 
> > > I'm not sure if we should keep only the equivalent of
> > > allprecursors(),
> > > what about having a predecessor() and allpredecessors() revsets.
> > > 
> > > predecessors() would returns the closests locally known
> > > predecessors,
> > > like the {predecessors} template keyword while allpredecessors()
> > > would
> > > returns all the locally known predecessors.
> > > 
> > > What do you think?
> > 
> > I think a dedicated revset to find the first generation, for lack of
> > a better word, of predecessors is useful.  I use it all the time to
> > ensure rebase/evolve conflict resolution didn’t drop anything, though
> > it only works if it wasn’t split or folded.
> 
> I think the term we used is closest predecessor, at least we have a
> function named closestpredecessors in obsutil.py.
> 
> > There was a proposal for short handing something like this, so maybe
> > a dedicated predicate won’t be needed in the future.  But it would be
> > nice if the template and revset agreed, and I don’t see an “all”
> > template as being as useful as the current one.
> > 
> > 
> > http://mercurial.markmail.org/thread/sjnnwa43s4eksu62
> > 
> 
> I was not aware of this proposal. I think a dedicated predicate would
> be usefull too to have more verbose and more readable templates.

Perhaps.

For the record, the core already has successors() which behaves more like
allsuccessors() in evolve. Maybe it could have depth=n option?
Matt Harbison - Jan. 17, 2018, 2:42 a.m.
On Tue, 16 Jan 2018 08:20:32 -0500, Boris Feld <boris.feld@octobus.net>  
wrote:

> On Tue, 2018-01-16 at 08:05 -0500, Matt Harbison wrote:
>> On Jan 16, 2018, at 5:26 AM, Boris Feld <boris.feld@octobus.net>
>> wrote:
>>
>> > I checked the precursors revset and the behavior is not the one
>> > expected, thank you for pointing that.
>> >
>> >
>> > I also realized that the revset is in Evolve and not in Core, which
>> > makes it a good opportunity to upstream it.
>> >
>> > I'm not sure if we should keep only the equivalent of
>> > allprecursors(),
>> > what about having a predecessor() and allpredecessors() revsets.
>> >
>> > predecessors() would returns the closests locally known
>> > predecessors,
>> > like the {predecessors} template keyword while allpredecessors()
>> > would
>> > returns all the locally known predecessors.
>> >
>> > What do you think?
>>
>> I think a dedicated revset to find the first generation, for lack of
>> a better word, of predecessors is useful.  I use it all the time to
>> ensure rebase/evolve conflict resolution didn’t drop anything, though
>> it only works if it wasn’t split or folded.
>
> I think the term we used is closest predecessor, at least we have a
> function named closestpredecessors in obsutil.py.
>
>> There was a proposal for short handing something like this, so maybe
>> a dedicated predicate won’t be needed in the future.  But it would be
>> nice if the template and revset agreed, and I don’t see an “all”
>> template as being as useful as the current one.
>>
>>
>> http://mercurial.markmail.org/thread/sjnnwa43s4eksu62
>>
>
> I was not aware of this proposal. I think a dedicated predicate would
> be usefull too to have more verbose and more readable templates.
>
> When you are speaking about an "all" tempalte, do you means that you
> don't think an "allpredecessors" template would be usefull in core?

I meant "allpredecessors", yes.

I'm not against adding it, I just can't immediately think of a use for a  
list that could encompass splits -> folds -> resplits -> etc.  I'm fine  
with Yuya's idea of a limit parameter, though that's more typing so a  
shortcut would be nice.  I tend to spell out templates/args/etc in scripts  
for clarity, but that's a tiny fraction of my usage.

Patch

diff --git a/hgext/lfs/__init__.py b/hgext/lfs/__init__.py
--- a/hgext/lfs/__init__.py
+++ b/hgext/lfs/__init__.py
@@ -19,8 +19,23 @@ 
     # (default: unset)
     url = https://example.com/lfs
 
-    # size of a file to make it use LFS
-    threshold = 10M
+    # Which files to track in LFS.  Path tests are "**.extname" for file
+    # extensions, and "path:under/some/directory" for path prefix.  Both
+    # are relative to the repository root, and the latter must be quoted.
+    # File size can be tested with the "size()" fileset, and tests can be
+    # joined with fileset operators.  (See "hg help filesets.operators".)
+    #
+    # Some examples:
+    # - all()                       # everything
+    # - none()                      # nothing
+    # - size(">20MB")               # larger than 20MB
+    # - !**.txt                     # anything not a *.txt file
+    # - **.zip | **.tar.gz | **.7z  # some types of compressed files
+    # - "path:bin"                  # files under "bin" in the project root
+    # - (**.php & size(">2MB")) | (**.js & size(">5MB")) | **.tar.gz
+    #     | ("path:bin" & !"path:/bin/README") | size(">1GB")
+    # (default: none())
+    track = size(">10M")
 
     # how many times to retry before giving up on transferring an object
     retry = 5
@@ -43,6 +58,7 @@ 
     filelog,
     hg,
     localrepo,
+    minifileset,
     node,
     registrar,
     revlog,
@@ -76,9 +92,13 @@ 
 configitem('lfs', 'usercache',
     default=None,
 )
+# Deprecated
 configitem('lfs', 'threshold',
     default=None,
 )
+configitem('lfs', 'track',
+    default='none()',
+)
 configitem('lfs', 'retry',
     default=5,
 )
@@ -100,9 +120,14 @@ 
     if not repo.local():
         return
 
-    threshold = repo.ui.configbytes('lfs', 'threshold')
+    trackspec = repo.ui.config('lfs', 'track')
 
-    repo.svfs.options['lfsthreshold'] = threshold
+    # deprecated config: lfs.threshold
+    threshold = repo.ui.configbytes('lfs', 'threshold')
+    if threshold:
+        trackspec = "(%s) | size('>%s')" % (trackspec, threshold)
+
+    repo.svfs.options['lfstrack'] = minifileset.compile(trackspec)
     repo.svfs.lfslocalblobstore = blobstore.local(repo)
     repo.svfs.lfsremoteblobstore = blobstore.remote(repo)
 
diff --git a/hgext/lfs/wrapper.py b/hgext/lfs/wrapper.py
--- a/hgext/lfs/wrapper.py
+++ b/hgext/lfs/wrapper.py
@@ -123,14 +123,14 @@ 
 def filelogaddrevision(orig, self, text, transaction, link, p1, p2,
                        cachedelta=None, node=None,
                        flags=revlog.REVIDX_DEFAULT_FLAGS, **kwds):
-    threshold = self.opener.options['lfsthreshold']
     textlen = len(text)
     # exclude hg rename meta from file size
     meta, offset = filelog.parsemeta(text)
     if offset:
         textlen -= offset
 
-    if threshold and textlen > threshold:
+    lfstrack = self.opener.options['lfstrack']
+    if lfstrack(self.filename, textlen):
         flags |= revlog.REVIDX_EXTSTORED
 
     return orig(self, text, transaction, link, p1, p2, cachedelta=cachedelta,
diff --git a/tests/test-lfs-test-server.t b/tests/test-lfs-test-server.t
--- a/tests/test-lfs-test-server.t
+++ b/tests/test-lfs-test-server.t
@@ -30,7 +30,7 @@ 
   > lfs=
   > [lfs]
   > url=http://foo:bar@$LFS_HOST/
-  > threshold=1
+  > track=all()
   > EOF
 
   $ hg init repo1
diff --git a/tests/test-lfs.t b/tests/test-lfs.t
--- a/tests/test-lfs.t
+++ b/tests/test-lfs.t
@@ -4,6 +4,7 @@ 
   > [extensions]
   > lfs=
   > [lfs]
+  > # Test deprecated config
   > threshold=1000B
   > EOF
 
@@ -140,7 +141,7 @@ 
   $ cd repo3
   $ cat >> .hg/hgrc << EOF
   > [lfs]
-  > threshold=10B
+  > track=size(">10B")
   > EOF
 
   $ echo LONGER-THAN-TEN-BYTES-WILL-TRIGGER-LFS > large
@@ -203,7 +204,7 @@ 
   $ cd repo6
   $ cat >> .hg/hgrc << EOF
   > [lfs]
-  > threshold=30B
+  > track=size(">30B")
   > EOF
 
   $ echo LARGE-BECAUSE-IT-IS-MORE-THAN-30-BYTES > large
@@ -239,7 +240,7 @@ 
   $ cd repo8
   $ cat >> .hg/hgrc << EOF
   > [lfs]
-  > threshold=10B
+  > track=size(">10B")
   > EOF
 
   $ echo THIS-IS-LFS-BECAUSE-10-BYTES > a1
@@ -320,7 +321,7 @@ 
   $ cd repo9
   $ cat >> .hg/hgrc << EOF
   > [lfs]
-  > threshold=10B
+  > track=size(">10B")
   > [diff]
   > git=1
   > EOF
@@ -454,7 +455,7 @@ 
   > [extensions]
   > lfs=
   > [lfs]
-  > threshold=1
+  > track=all()
   > EOF
   $ $PYTHON <<'EOF'
   > def write(path, content):
@@ -542,6 +543,47 @@ 
 
   $ cd ..
 
+# Test filter
+
+  $ hg init repo11
+  $ cd repo11
+  $ cat >> .hg/hgrc << EOF
+  > [lfs]
+  > track=(**.a & size(">5B")) | (**.b & !size(">5B"))
+  >      | (**.c & "path:d" & !"path:d/c.c") | size(">10B")
+  > EOF
+
+  $ mkdir a
+  $ echo aaaaaa > a/1.a
+  $ echo a > a/2.a
+  $ echo aaaaaa > 1.b
+  $ echo a > 2.b
+  $ echo a > 1.c
+  $ mkdir d
+  $ echo a > d/c.c
+  $ echo a > d/d.c
+  $ echo aaaaaaaaaaaa > x
+  $ hg add . -q
+  $ hg commit -m files
+
+  $ for p in a/1.a a/2.a 1.b 2.b 1.c d/c.c d/d.c x; do
+  >   if hg debugdata $p 0 2>&1 | grep git-lfs >/dev/null; then
+  >     echo "${p}: is lfs"
+  >   else
+  >     echo "${p}: not lfs"
+  >   fi
+  > done
+  a/1.a: is lfs
+  a/2.a: not lfs
+  1.b: not lfs
+  2.b: is lfs
+  1.c: not lfs
+  d/c.c: not lfs
+  d/d.c: is lfs
+  x: is lfs
+
+  $ cd ..
+
 # Verify the repos
 
   $ cat > $TESTTMP/dumpflog.py << EOF