Submitter | Durham Goode |
---|---|
Date | Feb. 5, 2016, 8:47 p.m. |
Message ID | <4bf06bc950fcbee8fc65.1454705248@dev8486.prn1.facebook.com> |
Download | mbox | patch |
Permalink | /patch/13016/ |
State | Superseded |
Commit | 86c4cbdaffeeb75b9f0298774ce6b3db8d5aa0aa |
Headers | show |
Comments
Durham Goode <durham@fb.com> writes: > # HG changeset patch > # User Durham Goode <durham@fb.com> > # Date 1454705079 28800 > # Fri Feb 05 12:44:39 2016 -0800 > # Node ID 4bf06bc950fcbee8fc65cd3fd17d366a27603ad9 > # Parent 01a5143cd25f285f8c745a92986cd7186bb32c90 > revset: use manifest.matches in _follow revset > > The old _follow revset iterated over every file in the commit and checked if it > matched. For repos with large manifests, this could take 500ms. By switching to > use manifest.matches() we can take advantage of the fastpaths built in to > manifest.py that allows iterating over only the files in the matcher when it's a > simple matcher. This brings the time spent down from 500ms to 0ms during simple > operations like 'hg log -f file.txt'. This looks great to me. Awesome speed up.
On Fri, Feb 5, 2016 at 12:47 PM, Durham Goode <durham@fb.com> wrote: > # HG changeset patch > # User Durham Goode <durham@fb.com> > # Date 1454705079 28800 > # Fri Feb 05 12:44:39 2016 -0800 > # Node ID 4bf06bc950fcbee8fc65cd3fd17d366a27603ad9 > # Parent 01a5143cd25f285f8c745a92986cd7186bb32c90 > revset: use manifest.matches in _follow revset > > The old _follow revset iterated over every file in the commit and checked if it > matched. For repos with large manifests, this could take 500ms. By switching to > use manifest.matches() we can take advantage of the fastpaths built in to > manifest.py that allows iterating over only the files in the matcher when it's a > simple matcher. This brings the time spent down from 500ms to 0ms during simple > operations like 'hg log -f file.txt'. Nice! > > diff --git a/mercurial/revset.py b/mercurial/revset.py > --- a/mercurial/revset.py > +++ b/mercurial/revset.py > @@ -1086,13 +1086,14 @@ def _follow(repo, subset, x, name, follo > matcher = matchmod.match(repo.root, repo.getcwd(), [x], > ctx=repo[None], default='path') > > + files = c.manifest().matches(matcher) I think you can even use c.manifest.walk(matcher) here to avoid creating the manifest. Obviously we don't need it to be faster than 0ms, but perhaps it would be noticeable when many files are matched. I'll leave it to you to test or just dismiss that. > + > s = set() > - for fname in c: > - if matcher(fname): > - fctx = c[fname] > - s = s.union(set(c.rev() for c in fctx.ancestors(followfirst))) > - # include the revision responsible for the most recent version > - s.add(fctx.introrev()) > + for fname in files: > + fctx = c[fname] > + s = s.union(set(c.rev() for c in fctx.ancestors(followfirst))) > + # include the revision responsible for the most recent version > + s.add(fctx.introrev()) > else: > s = _revancestors(repo, baseset([c.rev()]), followfirst) > > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
On 2/5/16 1:22 PM, Martin von Zweigbergk wrote: > On Fri, Feb 5, 2016 at 12:47 PM, Durham Goode <durham@fb.com> wrote: >> # HG changeset patch >> # User Durham Goode <durham@fb.com> >> # Date 1454705079 28800 >> # Fri Feb 05 12:44:39 2016 -0800 >> # Node ID 4bf06bc950fcbee8fc65cd3fd17d366a27603ad9 >> # Parent 01a5143cd25f285f8c745a92986cd7186bb32c90 >> revset: use manifest.matches in _follow revset >> >> The old _follow revset iterated over every file in the commit and checked if it >> matched. For repos with large manifests, this could take 500ms. By switching to >> use manifest.matches() we can take advantage of the fastpaths built in to >> manifest.py that allows iterating over only the files in the matcher when it's a >> simple matcher. This brings the time spent down from 500ms to 0ms during simple >> operations like 'hg log -f file.txt'. > Nice! > >> diff --git a/mercurial/revset.py b/mercurial/revset.py >> --- a/mercurial/revset.py >> +++ b/mercurial/revset.py >> @@ -1086,13 +1086,14 @@ def _follow(repo, subset, x, name, follo >> matcher = matchmod.match(repo.root, repo.getcwd(), [x], >> ctx=repo[None], default='path') >> >> + files = c.manifest().matches(matcher) > I think you can even use c.manifest.walk(matcher) here to avoid > creating the manifest. Obviously we don't need it to be faster than > 0ms, but perhaps it would be noticeable when many files are matched. > I'll leave it to you to test or just dismiss that. > Interesting. It's probably doesn't matter for this revset, but it's a good idea regardless. I'll send a V2 with that.
Patch
diff --git a/mercurial/revset.py b/mercurial/revset.py --- a/mercurial/revset.py +++ b/mercurial/revset.py @@ -1086,13 +1086,14 @@ def _follow(repo, subset, x, name, follo matcher = matchmod.match(repo.root, repo.getcwd(), [x], ctx=repo[None], default='path') + files = c.manifest().matches(matcher) + s = set() - for fname in c: - if matcher(fname): - fctx = c[fname] - s = s.union(set(c.rev() for c in fctx.ancestors(followfirst))) - # include the revision responsible for the most recent version - s.add(fctx.introrev()) + for fname in files: + fctx = c[fname] + s = s.union(set(c.rev() for c in fctx.ancestors(followfirst))) + # include the revision responsible for the most recent version + s.add(fctx.introrev()) else: s = _revancestors(repo, baseset([c.rev()]), followfirst)