Submitter | Mads Kiilerich |
---|---|
Date | Oct. 8, 2014, 12:22 a.m. |
Message ID | <7c48c97a07b865c86a75.1412727760@ssl.google-analytics.com> |
Download | mbox | patch |
Permalink | /patch/6150/ |
State | Changes Requested |
Headers | show |
Comments
On 10/07/2014 05:22 PM, Mads Kiilerich wrote: > # HG changeset patch > # User Mads Kiilerich <madski@unity3d.com> > # Date 1412727753 -7200 > # Wed Oct 08 02:22:33 2014 +0200 > # Node ID 7c48c97a07b865c86a75562f94656a64a8506273 > # Parent 564ae7d2ec9bee86b00a6ba817271ac0b19deca7 > revset: introduce optional 'while' predicate for ancestors() > > When specifying a 'while' set, ancestors() will now only visit parents that are > in that set. This makes it possible to prune while doing an ancestor traversal > and reduce the number of membership tests. Such a pruning is very convenient > when expensive checks are involved. > > The primary initial use case for this feature is that filtering on branch name > is so expensive. Often it is just as relevant to prune everything not on the > branch. Feature seems interresting. However ther is a massive refactoring on revset in progress. I'll look at the patch after the end of the refactoring landed (opfully tomorrow).
On 10/08/2014 03:27 AM, Pierre-Yves David wrote: > > > On 10/07/2014 05:22 PM, Mads Kiilerich wrote: >> # HG changeset patch >> # User Mads Kiilerich <madski@unity3d.com> >> # Date 1412727753 -7200 >> # Wed Oct 08 02:22:33 2014 +0200 >> # Node ID 7c48c97a07b865c86a75562f94656a64a8506273 >> # Parent 564ae7d2ec9bee86b00a6ba817271ac0b19deca7 >> revset: introduce optional 'while' predicate for ancestors() >> >> When specifying a 'while' set, ancestors() will now only visit >> parents that are >> in that set. This makes it possible to prune while doing an ancestor >> traversal >> and reduce the number of membership tests. Such a pruning is very >> convenient >> when expensive checks are involved. >> >> The primary initial use case for this feature is that filtering on >> branch name >> is so expensive. Often it is just as relevant to prune everything not >> on the >> branch. > > Feature seems interresting. However ther is a massive refactoring on > revset in progress. I'll look at the patch after the end of the > refactoring landed (opfully tomorrow). Any news on this ... or general thoughts on it from others? Do you guys agree such ancestor pruning is relevant? Can you imagine other kinds of pruning than when yielding ancestors? Some related queries could be max(::.&branch(default)) or min(.::&branch(stable)), but I assume the optimizer should be able to figure out it has iterate in the right order and 'prune' after finding the first match. Expressions with heads() and roots() are more tricky to optimize optimatically but should also be possible without special syntax / function parameters. /Mads
On 10/07/2014 05:22 PM, Mads Kiilerich wrote: > # HG changeset patch > # User Mads Kiilerich <madski@unity3d.com> > # Date 1412727753 -7200 > # Wed Oct 08 02:22:33 2014 +0200 > # Node ID 7c48c97a07b865c86a75562f94656a64a8506273 > # Parent 564ae7d2ec9bee86b00a6ba817271ac0b19deca7 > revset: introduce optional 'while' predicate for ancestors() > > When specifying a 'while' set, ancestors() will now only visit parents that are > in that set. This makes it possible to prune while doing an ancestor traversal > and reduce the number of membership tests. Such a pruning is very convenient > when expensive checks are involved. Good feature, not really possible to achieve the same result easily (and definitely not with the same perf) > > The primary initial use case for this feature is that filtering on branch name > is so expensive. Often it is just as relevant to prune everything not on the > branch. > > Example: > > $ hg --time debugrevspec 'branch(.)' | wc -l > time: real 9.380 secs (user 9.200+0.000 sys 0.180+0.000) > 119 > > $ hg --time debugrevspec 'ancestors(.)&branch(.)' | wc -l > time: real 10.070 secs (user 9.940+0.000 sys 0.110+0.000) > 119 > > $ hg --time debugrevspec 'ancestors(., branch(.))' | wc -l > time: real 0.160 secs (user 0.140+0.000 sys 0.020+0.000) > 119 Would be interested in output from the perf extension. Probably not worth adding an entry in the official benchmark. But I may be wrong. Also please retry to the latest state of revset the code (pushed in the clowncopter 1h ago). Feel encourage to peek at the code path and optimize silly things you could find. > diff --git a/mercurial/revset.py b/mercurial/revset.py > --- a/mercurial/revset.py > +++ b/mercurial/revset.py > @@ -17,7 +17,7 @@ import obsolete as obsmod > import pathutil > import repoview > > -def _revancestors(repo, revs, followfirst): > +def _revancestors(repo, revs, followfirst, while_=None): > """Like revlog.ancestors(), but supports followfirst.""" > cut = followfirst and 1 or None > cl = repo.changelog > @@ -41,10 +41,11 @@ def _revancestors(repo, revs, followfirs > revsnode = revqueue.popleft() > heapq.heappush(h, -revsnode) > seen.add(current) > - yield current > - for parent in cl.parentrevs(current)[:cut]: > - if parent != node.nullrev: > - heapq.heappush(h, -parent) > + if while_ is None or current in while_: > + yield current > + for parent in cl.parentrevs(current)[:cut]: > + if parent != node.nullrev: > + heapq.heappush(h, -parent) > > return _generatorset(iterate(), iterasc=False) > > @@ -344,15 +345,22 @@ def ancestor(repo, subset, x): > return baseset([]) > > def _ancestors(repo, subset, x, followfirst=False): > - args = getset(repo, spanset(repo), x) > + args = getargs(x, 0, 2, _('ancestors takes no, one or two arguments')) > if not args: > return baseset([]) > - s = _revancestors(repo, args, followfirst) > + heads = getset(repo, _spanset(repo), args[0]) > + if not heads: > + return baseset([]) > + while_ = None > + if len(args) > 1: > + while_ = getset(repo, _spanset(repo), args[1]) _spanset(repo) is very very wrong. Should be spanset(repo) (or fullreposet(repo) as returned by the spanset call) > + s = _revancestors(repo, heads, followfirst, while_=while_) > return subset.filter(s.__contains__) Unrelated, but should be `subset && s`. I'll do a series about those. > def ancestors(repo, subset, x): > - """``ancestors(set)`` > + """``ancestors(set, [while])`` > Changesets that are ancestors of a changeset in set. > + If a while set is specified, only follow ancestors in that set. > """ > return _ancestors(repo, subset, x) > > diff --git a/tests/test-revset.t b/tests/test-revset.t > --- a/tests/test-revset.t > +++ b/tests/test-revset.t > @@ -246,6 +246,16 @@ ancestor can accept 0 or more arguments > 5 > $ log 'ancestor(ancestors(5))' > 0 > + > +ancestors can prune using the optional while expression > + > + $ hg log -T'{rev}\n' -r 'ancestors(5+9,1:7)' > + 1 > + 3 > + 5 > + > +other expressions ... > + > $ log 'author(bob)' > 2 > $ log 'author("re:bob|test")' > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@selenic.com > http://selenic.com/mailman/listinfo/mercurial-devel >
On 10/11/2014 02:41 AM, Pierre-Yves David wrote: > > > On 10/10/2014 07:12 AM, Mads Kiilerich wrote: >> On 10/08/2014 03:27 AM, Pierre-Yves David wrote: >>> >>> >>> On 10/07/2014 05:22 PM, Mads Kiilerich wrote: >>>> # HG changeset patch >>>> # User Mads Kiilerich <madski@unity3d.com> >>>> # Date 1412727753 -7200 >>>> # Wed Oct 08 02:22:33 2014 +0200 >>>> # Node ID 7c48c97a07b865c86a75562f94656a64a8506273 >>>> # Parent 564ae7d2ec9bee86b00a6ba817271ac0b19deca7 >>>> revset: introduce optional 'while' predicate for ancestors() >>>> >>>> When specifying a 'while' set, ancestors() will now only visit >>>> parents that are >>>> in that set. This makes it possible to prune while doing an ancestor >>>> traversal >>>> and reduce the number of membership tests. Such a pruning is very >>>> convenient >>>> when expensive checks are involved. >>>> >>>> The primary initial use case for this feature is that filtering on >>>> branch name >>>> is so expensive. Often it is just as relevant to prune everything not >>>> on the >>>> branch. >>> >>> Feature seems interresting. However ther is a massive refactoring on >>> revset in progress. I'll look at the patch after the end of the >>> refactoring landed (opfully tomorrow). >> >> Any news on this ... or general thoughts on it from others? >> >> Do you guys agree such ancestor pruning is relevant? Can you imagine >> other kinds of pruning than when yielding ancestors? > > Could be useful for descendant too. For my primary use case of finding changesets on a branch, we don't need it. We already have the branch head and do thus know up front that all changesets after that can be pruned. That do not require new language support. But yes, it should be added to descendeants too for completeness and consistency. > Seems related to `only()` so maybe the implementation could be joined > or something. Hmm. Yes. only(X,Y) == ancestors(X,!::Y) . The computation of ::Y should already be lazy and optimized ... but probably not as much as the current implementation of only. > Seems also related to X::Y. That one requires backtracking or some kind of keeping track of traces of multiple paths and merge or discard them while iterating the DAG. I don't think there is anything to share with this feature. /Mads
On 10/11/2014 02:33 AM, Pierre-Yves David wrote: > On 10/07/2014 05:22 PM, Mads Kiilerich wrote: >> # HG changeset patch >> # User Mads Kiilerich <madski@unity3d.com> >> # Date 1412727753 -7200 >> # Wed Oct 08 02:22:33 2014 +0200 >> # Node ID 7c48c97a07b865c86a75562f94656a64a8506273 >> # Parent 564ae7d2ec9bee86b00a6ba817271ac0b19deca7 >> revset: introduce optional 'while' predicate for ancestors() >> >> When specifying a 'while' set, ancestors() will now only visit >> parents that are >> in that set. This makes it possible to prune while doing an ancestor >> traversal >> and reduce the number of membership tests. Such a pruning is very >> convenient >> when expensive checks are involved. > > Good feature, not really possible to achieve the same result easily > (and definitely not with the same perf) > >> >> The primary initial use case for this feature is that filtering on >> branch name >> is so expensive. Often it is just as relevant to prune everything not >> on the >> branch. >> >> Example: >> >> $ hg --time debugrevspec 'branch(.)' | wc -l >> time: real 9.380 secs (user 9.200+0.000 sys 0.180+0.000) >> 119 >> >> $ hg --time debugrevspec 'ancestors(.)&branch(.)' | wc -l >> time: real 10.070 secs (user 9.940+0.000 sys 0.110+0.000) >> 119 >> >> $ hg --time debugrevspec 'ancestors(., branch(.))' | wc -l >> time: real 0.160 secs (user 0.140+0.000 sys 0.020+0.000) >> 119 > > Would be interested in output from the perf extension. Probably not > worth adding an entry in the official benchmark. But I may be wrong. What/how? The documentation of perf.py and its intended use is very sparse. There is also no mentioning of how revsetbenchmarks.txt should be used. The wiki gives no hits. I assume it is something that is related to some facebook internal setup using internal repos. >> diff --git a/mercurial/revset.py b/mercurial/revset.py >> --- a/mercurial/revset.py >> +++ b/mercurial/revset.py >> @@ -17,7 +17,7 @@ import obsolete as obsmod >> import pathutil >> import repoview >> >> -def _revancestors(repo, revs, followfirst): >> +def _revancestors(repo, revs, followfirst, while_=None): >> """Like revlog.ancestors(), but supports followfirst.""" >> cut = followfirst and 1 or None >> cl = repo.changelog >> @@ -41,10 +41,11 @@ def _revancestors(repo, revs, followfirs >> revsnode = revqueue.popleft() >> heapq.heappush(h, -revsnode) >> seen.add(current) >> - yield current >> - for parent in cl.parentrevs(current)[:cut]: >> - if parent != node.nullrev: >> - heapq.heappush(h, -parent) >> + if while_ is None or current in while_: >> + yield current >> + for parent in cl.parentrevs(current)[:cut]: >> + if parent != node.nullrev: >> + heapq.heappush(h, -parent) >> >> return _generatorset(iterate(), iterasc=False) >> >> @@ -344,15 +345,22 @@ def ancestor(repo, subset, x): >> return baseset([]) >> >> def _ancestors(repo, subset, x, followfirst=False): >> - args = getset(repo, spanset(repo), x) >> + args = getargs(x, 0, 2, _('ancestors takes no, one or two >> arguments')) >> if not args: >> return baseset([]) >> - s = _revancestors(repo, args, followfirst) >> + heads = getset(repo, _spanset(repo), args[0]) >> + if not heads: >> + return baseset([]) >> + while_ = None >> + if len(args) > 1: >> + while_ = getset(repo, _spanset(repo), args[1]) > > _spanset(repo) is very very wrong. Should be spanset(repo) (or > fullreposet(repo) as returned by the spanset call) Why? The spanset docstring says the opposite. It suggests that all use of spanset should replaced by either fullreposet or _spanset. /Mads
On 10/11/2014 05:45 PM, Mads Kiilerich wrote: > On 10/11/2014 02:33 AM, Pierre-Yves David wrote: >> On 10/07/2014 05:22 PM, Mads Kiilerich wrote: >>> # HG changeset patch >>> # User Mads Kiilerich <madski@unity3d.com> >>> # Date 1412727753 -7200 >>> # Wed Oct 08 02:22:33 2014 +0200 >>> # Node ID 7c48c97a07b865c86a75562f94656a64a8506273 >>> # Parent 564ae7d2ec9bee86b00a6ba817271ac0b19deca7 >>> revset: introduce optional 'while' predicate for ancestors() >>> >>> When specifying a 'while' set, ancestors() will now only visit >>> parents that are >>> in that set. This makes it possible to prune while doing an ancestor >>> traversal >>> and reduce the number of membership tests. Such a pruning is very >>> convenient >>> when expensive checks are involved. >> >> Good feature, not really possible to achieve the same result easily >> (and definitely not with the same perf) >> >>> >>> The primary initial use case for this feature is that filtering on >>> branch name >>> is so expensive. Often it is just as relevant to prune everything not >>> on the >>> branch. >>> >>> Example: >>> >>> $ hg --time debugrevspec 'branch(.)' | wc -l >>> time: real 9.380 secs (user 9.200+0.000 sys 0.180+0.000) >>> 119 >>> >>> $ hg --time debugrevspec 'ancestors(.)&branch(.)' | wc -l >>> time: real 10.070 secs (user 9.940+0.000 sys 0.110+0.000) >>> 119 >>> >>> $ hg --time debugrevspec 'ancestors(., branch(.))' | wc -l >>> time: real 0.160 secs (user 0.140+0.000 sys 0.020+0.000) >>> 119 >> >> Would be interested in output from the perf extension. Probably not >> worth adding an entry in the official benchmark. But I may be wrong. > > What/how? The documentation of perf.py and its intended use is very > sparse. yeah, sure, the documentation is hard to get: $ hg help perfrevset hg perfrevset REVSET benchmark the execution time of a revset […] <sarcasm/> Joke aside I'm surprise you have never heard of this 6 year old extension. It host all kind of command dedicated to testing specific part of the mercurial code. > There is also no mentioning of how revsetbenchmarks.txt should > be used. May you could look at the revsetbenchmarks.py file right next to ;-) > The wiki gives no hits. I assume it is something that is > related to some facebook internal setup using internal repos. Yeah, sure, as it was facebook internal we put it in the Mercurial public repo. And then we had people external to Facebook update its content and run it ;-) revsetbenchmarks.py is a scrip to run perfrevset (you, again!) for a list of revset against a set of revision (you are not so interested in the revision part. $ ./revsetbenchmark.py <revs-to-be-tested> [-f <file-to-read-revsets-from>] if -f is omitted revsets are read from stdin. revsetbenchmark.txt is just a list of interesting revsets we want tested on a regular basis. >>> diff --git a/mercurial/revset.py b/mercurial/revset.py >>> --- a/mercurial/revset.py >>> +++ b/mercurial/revset.py >>> @@ -17,7 +17,7 @@ import obsolete as obsmod >>> import pathutil >>> import repoview >>> >>> -def _revancestors(repo, revs, followfirst): >>> +def _revancestors(repo, revs, followfirst, while_=None): >>> """Like revlog.ancestors(), but supports followfirst.""" >>> cut = followfirst and 1 or None >>> cl = repo.changelog >>> @@ -41,10 +41,11 @@ def _revancestors(repo, revs, followfirs >>> revsnode = revqueue.popleft() >>> heapq.heappush(h, -revsnode) >>> seen.add(current) >>> - yield current >>> - for parent in cl.parentrevs(current)[:cut]: >>> - if parent != node.nullrev: >>> - heapq.heappush(h, -parent) >>> + if while_ is None or current in while_: >>> + yield current >>> + for parent in cl.parentrevs(current)[:cut]: >>> + if parent != node.nullrev: >>> + heapq.heappush(h, -parent) >>> >>> return _generatorset(iterate(), iterasc=False) >>> >>> @@ -344,15 +345,22 @@ def ancestor(repo, subset, x): >>> return baseset([]) >>> >>> def _ancestors(repo, subset, x, followfirst=False): >>> - args = getset(repo, spanset(repo), x) >>> + args = getargs(x, 0, 2, _('ancestors takes no, one or two >>> arguments')) >>> if not args: >>> return baseset([]) >>> - s = _revancestors(repo, args, followfirst) >>> + heads = getset(repo, _spanset(repo), args[0]) >>> + if not heads: >>> + return baseset([]) >>> + while_ = None >>> + if len(args) > 1: >>> + while_ = getset(repo, _spanset(repo), args[1]) >> >> _spanset(repo) is very very wrong. Should be spanset(repo) (or >> fullreposet(repo) as returned by the spanset call) > > Why? The spanset docstring says the opposite. It suggests that all use > of spanset should replaced by either fullreposet or _spanset. Because by doing so, you bypass the fullreposet optimisation. If you want a spanset over the whole repo, use fullreposet. see http://selenic.com/hg/rev/fbae659543cf (and related changes for details)
On Fri, 2014-10-10 at 16:12 +0200, Mads Kiilerich wrote: > On 10/08/2014 03:27 AM, Pierre-Yves David wrote: > > > > > > On 10/07/2014 05:22 PM, Mads Kiilerich wrote: > >> # HG changeset patch > >> # User Mads Kiilerich <madski@unity3d.com> > >> # Date 1412727753 -7200 > >> # Wed Oct 08 02:22:33 2014 +0200 > >> # Node ID 7c48c97a07b865c86a75562f94656a64a8506273 > >> # Parent 564ae7d2ec9bee86b00a6ba817271ac0b19deca7 > >> revset: introduce optional 'while' predicate for ancestors() > >> > >> When specifying a 'while' set, ancestors() will now only visit > >> parents that are > >> in that set. This makes it possible to prune while doing an ancestor > >> traversal > >> and reduce the number of membership tests. Such a pruning is very > >> convenient > >> when expensive checks are involved. > >> > >> The primary initial use case for this feature is that filtering on > >> branch name > >> is so expensive. Often it is just as relevant to prune everything not > >> on the > >> branch. > > > > Feature seems interresting. However ther is a massive refactoring on > > revset in progress. I'll look at the patch after the end of the > > refactoring landed (opfully tomorrow). > > Any news on this ... or general thoughts on it from others? Let's have this discussion next month.
Patch
diff --git a/mercurial/revset.py b/mercurial/revset.py --- a/mercurial/revset.py +++ b/mercurial/revset.py @@ -17,7 +17,7 @@ import obsolete as obsmod import pathutil import repoview -def _revancestors(repo, revs, followfirst): +def _revancestors(repo, revs, followfirst, while_=None): """Like revlog.ancestors(), but supports followfirst.""" cut = followfirst and 1 or None cl = repo.changelog @@ -41,10 +41,11 @@ def _revancestors(repo, revs, followfirs revsnode = revqueue.popleft() heapq.heappush(h, -revsnode) seen.add(current) - yield current - for parent in cl.parentrevs(current)[:cut]: - if parent != node.nullrev: - heapq.heappush(h, -parent) + if while_ is None or current in while_: + yield current + for parent in cl.parentrevs(current)[:cut]: + if parent != node.nullrev: + heapq.heappush(h, -parent) return _generatorset(iterate(), iterasc=False) @@ -344,15 +345,22 @@ def ancestor(repo, subset, x): return baseset([]) def _ancestors(repo, subset, x, followfirst=False): - args = getset(repo, spanset(repo), x) + args = getargs(x, 0, 2, _('ancestors takes no, one or two arguments')) if not args: return baseset([]) - s = _revancestors(repo, args, followfirst) + heads = getset(repo, _spanset(repo), args[0]) + if not heads: + return baseset([]) + while_ = None + if len(args) > 1: + while_ = getset(repo, _spanset(repo), args[1]) + s = _revancestors(repo, heads, followfirst, while_=while_) return subset.filter(s.__contains__) def ancestors(repo, subset, x): - """``ancestors(set)`` + """``ancestors(set, [while])`` Changesets that are ancestors of a changeset in set. + If a while set is specified, only follow ancestors in that set. """ return _ancestors(repo, subset, x) diff --git a/tests/test-revset.t b/tests/test-revset.t --- a/tests/test-revset.t +++ b/tests/test-revset.t @@ -246,6 +246,16 @@ ancestor can accept 0 or more arguments 5 $ log 'ancestor(ancestors(5))' 0 + +ancestors can prune using the optional while expression + + $ hg log -T'{rev}\n' -r 'ancestors(5+9,1:7)' + 1 + 3 + 5 + +other expressions ... + $ log 'author(bob)' 2 $ log 'author("re:bob|test")'