Submitter | Durham Goode |
---|---|
Date | Nov. 8, 2015, 1:20 a.m. |
Message ID | <7c680f209f7af35c7c47.1446945611@dev8060.prn1.facebook.com> |
Download | mbox | patch |
Permalink | /patch/11327/ |
State | Rejected |
Delegated to: | Pierre-Yves David |
Headers | show |
Comments
On 11/7/15 5:20 PM, Durham Goode wrote: > # HG changeset patch > # User Durham Goode <durham@fb.com> > # Date 1446945001 28800 > # Sat Nov 07 17:10:01 2015 -0800 > # Node ID 7c680f209f7af35c7c476eecc2f9eec13b32ad62 > # Parent 48547b4c77defdd17c670b1eb0eb94272edf0207 > inhibit: improve transaction marker perf > > The old algorithm was a revset "::X and obsolete()". This was inefficient because > it requires walking all the way down the ancestor chain (since the revset did > not know it could stop walking at public nodes). > > The new algorithm uses changelog.ancestors() directly and provides a function to > stop the iteration when we reach a public node. During a commit to a repo with > hundreds of thousands of commits, this change reduces the inhibitmarkers time > from 1.5 seconds to effectively 0 seconds. > This patch is dependent on the pending "ancestors: add stopfunc to revlog.ancestors" patch being accepted into core.
On Sat, Nov 7, 2015 at 5:21 PM Durham Goode <durham@fb.com> wrote: > # HG changeset patch > # User Durham Goode <durham@fb.com> > # Date 1446945001 28800 > # Sat Nov 07 17:10:01 2015 -0800 > # Node ID 7c680f209f7af35c7c476eecc2f9eec13b32ad62 > # Parent 48547b4c77defdd17c670b1eb0eb94272edf0207 > inhibit: improve transaction marker perf > > The old algorithm was a revset "::X and obsolete()". This was inefficient > because > it requires walking all the way down the ancestor chain (since the revset > did > not know it could stop walking at public nodes). > I was hoping to reproduce the slowness on the Mozilla repo (270k revisions), but "hg log -r '::tip and obsolete()'" runs in 180 ms. Do you have a better command for me to try? How many obsmarkers in the repo you tried it on? > The new algorithm uses changelog.ancestors() directly and provides a > function to > stop the iteration when we reach a public node. During a commit to a repo > with > hundreds of thousands of commits, this change reduces the inhibitmarkers > time > from 1.5 seconds to effectively 0 seconds. > > diff --git a/hgext/inhibit.py b/hgext/inhibit.py > --- a/hgext/inhibit.py > +++ b/hgext/inhibit.py > @@ -23,6 +23,7 @@ from mercurial import scmutil > from mercurial import commands > from mercurial import lock as lockmod > from mercurial import bookmarks > +from mercurial import phases > from mercurial import util > from mercurial.i18n import _ > > @@ -129,7 +130,15 @@ def _inhibitmarkers(repo, nodes): > if not _inhibitenabled(repo): > return > > - newinhibit = repo.set('::%ln and obsolete()', nodes) > + newinhibit = set() > + revs = [repo[node].rev() for node in nodes] > + phase = repo._phasecache.phase > + obsoletes = obsolete.getrevs(repo, 'obsolete') > + for anc in repo.changelog.ancestors(revs, inclusive=True, > + stopfunc=lambda rev: phase(repo, rev) == phases.public): > + if anc in obsoletes: > + newinhibit.add(repo[anc]) > + > if newinhibit: > lock = tr = None > try: > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@selenic.com > https://selenic.com/mailman/listinfo/mercurial-devel >
On 11/7/15 9:52 PM, Martin von Zweigbergk wrote: > > > On Sat, Nov 7, 2015 at 5:21 PM Durham Goode <durham@fb.com > <mailto:durham@fb.com>> wrote: > > # HG changeset patch > # User Durham Goode <durham@fb.com <mailto:durham@fb.com>> > # Date 1446945001 28800 > # Sat Nov 07 17:10:01 2015 -0800 > # Node ID 7c680f209f7af35c7c476eecc2f9eec13b32ad62 > # Parent 48547b4c77defdd17c670b1eb0eb94272edf0207 > inhibit: improve transaction marker perf > > The old algorithm was a revset "::X and obsolete()". This was > inefficient because > it requires walking all the way down the ancestor chain (since the > revset did > not know it could stop walking at public nodes). > > > I was hoping to reproduce the slowness on the Mozilla repo (270k > revisions), but "hg log -r '::tip and obsolete()'" runs in 180 ms. Do > you have a better command for me to try? How many obsmarkers in the > repo you tried it on? You need to make sure you pass --hidden, otherwise the obsolete() revset resolves to an empty set and tests against it are cheap. I have 1500 obs markers, if that affects things.
On Sat, Nov 7, 2015 at 10:15 PM Durham Goode <durham@fb.com> wrote: > > > On 11/7/15 9:52 PM, Martin von Zweigbergk wrote: > > > > On Sat, Nov 7, 2015 at 5:21 PM Durham Goode <durham@fb.com> wrote: > >> # HG changeset patch >> # User Durham Goode <durham@fb.com> >> # Date 1446945001 28800 >> # Sat Nov 07 17:10:01 2015 -0800 >> # Node ID 7c680f209f7af35c7c476eecc2f9eec13b32ad62 >> # Parent 48547b4c77defdd17c670b1eb0eb94272edf0207 >> inhibit: improve transaction marker perf >> >> The old algorithm was a revset "::X and obsolete()". This was inefficient >> because >> it requires walking all the way down the ancestor chain (since the revset >> did >> not know it could stop walking at public nodes). >> > > I was hoping to reproduce the slowness on the Mozilla repo (270k > revisions), but "hg log -r '::tip and obsolete()'" runs in 180 ms. Do you > have a better command for me to try? How many obsmarkers in the repo you > tried it on? > > You need to make sure you pass --hidden, otherwise the obsolete() revset > resolves to an empty set and tests against it are cheap. > I'm still not able to reproduce this :-( Some tests take ~800 ms, but most of that time seems to be related to loading obsmarkers and not about iterating over the revset. I wanted to see if I could get the same results by playing with the revset optimizer. I have never looked at that code before and I don't know if it's a good idea. If it is, I'll have to let you do it yourself since I can't even test it. > > I have 1500 obs markers, if that affects things. >
On Sat, Nov 7, 2015 at 5:20 PM, Durham Goode <durham@fb.com> wrote: > # HG changeset patch > # User Durham Goode <durham@fb.com> > # Date 1446945001 28800 > # Sat Nov 07 17:10:01 2015 -0800 > # Node ID 7c680f209f7af35c7c476eecc2f9eec13b32ad62 > # Parent 48547b4c77defdd17c670b1eb0eb94272edf0207 > inhibit: improve transaction marker perf > > The old algorithm was a revset "::X and obsolete()". This was inefficient > because > it requires walking all the way down the ancestor chain (since the revset > did > not know it could stop walking at public nodes). > > The new algorithm uses changelog.ancestors() directly and provides a > function to > stop the iteration when we reach a public node. During a commit to a repo > with > hundreds of thousands of commits, this change reduces the inhibitmarkers > time > from 1.5 seconds to effectively 0 seconds. > > diff --git a/hgext/inhibit.py b/hgext/inhibit.py > --- a/hgext/inhibit.py > +++ b/hgext/inhibit.py > @@ -23,6 +23,7 @@ from mercurial import scmutil > from mercurial import commands > from mercurial import lock as lockmod > from mercurial import bookmarks > +from mercurial import phases > from mercurial import util > from mercurial.i18n import _ > > @@ -129,7 +130,15 @@ def _inhibitmarkers(repo, nodes): > if not _inhibitenabled(repo): > return > > - newinhibit = repo.set('::%ln and obsolete()', nodes) > + newinhibit = set() > + revs = [repo[node].rev() for node in nodes] > + phase = repo._phasecache.phase > + obsoletes = obsolete.getrevs(repo, 'obsolete') > + for anc in repo.changelog.ancestors(revs, inclusive=True, > + stopfunc=lambda rev: phase(repo, rev) == phases.public): > + if anc in obsoletes: > + newinhibit.add(repo[anc]) > + > > I must be missing something obvious: why can't this be written as: obsoletes = ... for rev in repo.revs('::%ln and not public()'): if rev in obsoletes: ... I don't understand why stopfunc is needed and why a revset can't do the same thing.
On 11/8/15 8:28 AM, Gregory Szorc wrote: > On Sat, Nov 7, 2015 at 5:20 PM, Durham Goode <durham@fb.com > <mailto:durham@fb.com>> wrote: > > # HG changeset patch > # User Durham Goode <durham@fb.com <mailto:durham@fb.com>> > # Date 1446945001 28800 > # Sat Nov 07 17:10:01 2015 -0800 > # Node ID 7c680f209f7af35c7c476eecc2f9eec13b32ad62 > # Parent 48547b4c77defdd17c670b1eb0eb94272edf0207 > inhibit: improve transaction marker perf > > The old algorithm was a revset "::X and obsolete()". This was > inefficient because > it requires walking all the way down the ancestor chain (since the > revset did > not know it could stop walking at public nodes). > > The new algorithm uses changelog.ancestors() directly and provides > a function to > stop the iteration when we reach a public node. During a commit to > a repo with > hundreds of thousands of commits, this change reduces the > inhibitmarkers time > from 1.5 seconds to effectively 0 seconds. > > diff --git a/hgext/inhibit.py b/hgext/inhibit.py > --- a/hgext/inhibit.py > +++ b/hgext/inhibit.py > @@ -23,6 +23,7 @@ from mercurial import scmutil > from mercurial import commands > from mercurial import lock as lockmod > from mercurial import bookmarks > +from mercurial import phases > from mercurial import util > from mercurial.i18n import _ > > @@ -129,7 +130,15 @@ def _inhibitmarkers(repo, nodes): > if not _inhibitenabled(repo): > return > > - newinhibit = repo.set('::%ln and obsolete()', nodes) > + newinhibit = set() > + revs = [repo[node].rev() for node in nodes] > + phase = repo._phasecache.phase > + obsoletes = obsolete.getrevs(repo, 'obsolete') > + for anc in repo.changelog.ancestors(revs, inclusive=True, > + stopfunc=lambda rev: phase(repo, rev) == > phases.public): > + if anc in obsoletes: > + newinhibit.add(repo[anc]) > + > > > I must be missing something obvious: why can't this be written as: > > obsoletes = ... > for rev in repo.revs('::%ln and not public()'): > if rev in obsoletes: > ... > > I don't understand why stopfunc is needed and why a revset can't do > the same thing. The way "::X and not public()" works is it iterates down the ancestor line and checks each commit to see if it's not public. The iterator has no knowledge that finding something that is public means it can stop iterating, so it will iterate over all the commits. In theory we could try to make revsets smarter (i.e. by allowing the ancestor revset to communicate to the public revset that it's doing a descendant traversal, and allowing the public revset to signal a stop), but that's rather complex, and the implementation would probably end up adding the stopfunc to lazyancestors anyway (since that's how you'd likely implement it).
On Sun, Nov 8, 2015 at 8:38 AM, Durham Goode <durham@fb.com> wrote: > > > On 11/8/15 8:28 AM, Gregory Szorc wrote: > > On Sat, Nov 7, 2015 at 5:20 PM, Durham Goode <durham@fb.com> wrote: > >> # HG changeset patch >> # User Durham Goode <durham@fb.com> >> # Date 1446945001 28800 >> # Sat Nov 07 17:10:01 2015 -0800 >> # Node ID 7c680f209f7af35c7c476eecc2f9eec13b32ad62 >> # Parent 48547b4c77defdd17c670b1eb0eb94272edf0207 >> inhibit: improve transaction marker perf >> >> The old algorithm was a revset "::X and obsolete()". This was inefficient >> because >> it requires walking all the way down the ancestor chain (since the revset >> did >> not know it could stop walking at public nodes). >> >> The new algorithm uses changelog.ancestors() directly and provides a >> function to >> stop the iteration when we reach a public node. During a commit to a repo >> with >> hundreds of thousands of commits, this change reduces the inhibitmarkers >> time >> from 1.5 seconds to effectively 0 seconds. >> >> diff --git a/hgext/inhibit.py b/hgext/inhibit.py >> --- a/hgext/inhibit.py >> +++ b/hgext/inhibit.py >> @@ -23,6 +23,7 @@ from mercurial import scmutil >> from mercurial import commands >> from mercurial import lock as lockmod >> from mercurial import bookmarks >> +from mercurial import phases >> from mercurial import util >> from mercurial.i18n import _ >> >> @@ -129,7 +130,15 @@ def _inhibitmarkers(repo, nodes): >> if not _inhibitenabled(repo): >> return >> >> - newinhibit = repo.set('::%ln and obsolete()', nodes) >> + newinhibit = set() >> + revs = [repo[node].rev() for node in nodes] >> + phase = repo._phasecache.phase >> + obsoletes = obsolete.getrevs(repo, 'obsolete') >> + for anc in repo.changelog.ancestors(revs, inclusive=True, >> + stopfunc=lambda rev: phase(repo, rev) == phases.public): >> + if anc in obsoletes: >> + newinhibit.add(repo[anc]) >> + >> >> > I must be missing something obvious: why can't this be written as: > > obsoletes = ... > for rev in repo.revs('::%ln and not public()'): > if rev in obsoletes: > ... > > I don't understand why stopfunc is needed and why a revset can't do the > same thing. > > The way "::X and not public()" works is it iterates down the ancestor line > and checks each commit to see if it's not public. The iterator has no > knowledge that finding something that is public means it can stop > iterating, so it will iterate over all the commits. > > In theory we could try to make revsets smarter (i.e. by allowing the > ancestor revset to communicate to the public revset that it's doing a > descendant traversal, and allowing the public revset to signal a stop), but > that's rather complex, and the implementation would probably end up adding > the stopfunc to lazyancestors anyway (since that's how you'd likely > implement it). > I see. Thanks for explaining. And here I thought all that work in recent releases to drastically speed up "not public()" implemented things like this because it had to.
On 11/8/15 8:46 AM, Gregory Szorc wrote: > On Sun, Nov 8, 2015 at 8:38 AM, Durham Goode <durham@fb.com > <mailto:durham@fb.com>> wrote: > > > > On 11/8/15 8:28 AM, Gregory Szorc wrote: >> On Sat, Nov 7, 2015 at 5:20 PM, Durham Goode <durham@fb.com >> <mailto:durham@fb.com>> wrote: >> >> # HG changeset patch >> # User Durham Goode <durham@fb.com <mailto:durham@fb.com>> >> # Date 1446945001 28800 >> # Sat Nov 07 17:10:01 2015 -0800 >> # Node ID 7c680f209f7af35c7c476eecc2f9eec13b32ad62 >> # Parent 48547b4c77defdd17c670b1eb0eb94272edf0207 >> inhibit: improve transaction marker perf >> >> The old algorithm was a revset "::X and obsolete()". This was >> inefficient because >> it requires walking all the way down the ancestor chain >> (since the revset did >> not know it could stop walking at public nodes). >> >> The new algorithm uses changelog.ancestors() directly and >> provides a function to >> stop the iteration when we reach a public node. During a >> commit to a repo with >> hundreds of thousands of commits, this change reduces the >> inhibitmarkers time >> from 1.5 seconds to effectively 0 seconds. >> >> diff --git a/hgext/inhibit.py b/hgext/inhibit.py >> --- a/hgext/inhibit.py >> +++ b/hgext/inhibit.py >> @@ -23,6 +23,7 @@ from mercurial import scmutil >> from mercurial import commands >> from mercurial import lock as lockmod >> from mercurial import bookmarks >> +from mercurial import phases >> from mercurial import util >> from mercurial.i18n import _ >> >> @@ -129,7 +130,15 @@ def _inhibitmarkers(repo, nodes): >> if not _inhibitenabled(repo): >> return >> >> - newinhibit = repo.set('::%ln and obsolete()', nodes) >> + newinhibit = set() >> + revs = [repo[node].rev() for node in nodes] >> + phase = repo._phasecache.phase >> + obsoletes = obsolete.getrevs(repo, 'obsolete') >> + for anc in repo.changelog.ancestors(revs, inclusive=True, >> + stopfunc=lambda rev: phase(repo, rev) == >> phases.public): >> + if anc in obsoletes: >> + newinhibit.add(repo[anc]) >> + >> >> >> I must be missing something obvious: why can't this be written as: >> >> obsoletes = ... >> for rev in repo.revs('::%ln and not public()'): >> if rev in obsoletes: >> ... >> >> I don't understand why stopfunc is needed and why a revset can't >> do the same thing. > The way "::X and not public()" works is it iterates down the > ancestor line and checks each commit to see if it's not public. > The iterator has no knowledge that finding something that is > public means it can stop iterating, so it will iterate over all > the commits. > > In theory we could try to make revsets smarter (i.e. by allowing > the ancestor revset to communicate to the public revset that it's > doing a descendant traversal, and allowing the public revset to > signal a stop), but that's rather complex, and the implementation > would probably end up adding the stopfunc to lazyancestors anyway > (since that's how you'd likely implement it). > > > I see. Thanks for explaining. And here I thought all that work in > recent releases to drastically speed up "not public()" implemented > things like this because it had to. "not public()" is optimized, so it's quick. But testing a series of commits against it is still O(size of series). In theory we could optimize "::X and not public()" to iterate over the "not public()" set instead of the ancestors, then test against the ancestor set (which is cheaper), but that's still O(distance from first not-public commit), which is still too slow in this case.
On 11/8/15 12:04 AM, Martin von Zweigbergk wrote: > > > On Sat, Nov 7, 2015 at 10:15 PM Durham Goode <durham@fb.com > <mailto:durham@fb.com>> wrote: > > > > On 11/7/15 9:52 PM, Martin von Zweigbergk wrote: >> >> >> On Sat, Nov 7, 2015 at 5:21 PM Durham Goode <durham@fb.com >> <mailto:durham@fb.com>> wrote: >> >> # HG changeset patch >> # User Durham Goode <durham@fb.com <mailto:durham@fb.com>> >> # Date 1446945001 28800 >> # Sat Nov 07 17:10:01 2015 -0800 >> # Node ID 7c680f209f7af35c7c476eecc2f9eec13b32ad62 >> # Parent 48547b4c77defdd17c670b1eb0eb94272edf0207 >> inhibit: improve transaction marker perf >> >> The old algorithm was a revset "::X and obsolete()". This was >> inefficient because >> it requires walking all the way down the ancestor chain >> (since the revset did >> not know it could stop walking at public nodes). >> >> >> I was hoping to reproduce the slowness on the Mozilla repo (270k >> revisions), but "hg log -r '::tip and obsolete()'" runs in 180 >> ms. Do you have a better command for me to try? How many >> obsmarkers in the repo you tried it on? > You need to make sure you pass --hidden, otherwise the obsolete() > revset resolves to an empty set and tests against it are cheap. > > > I'm still not able to reproduce this :-( Some tests take ~800 ms, but > most of that time seems to be related to loading obsmarkers and not > about iterating over the revset. > > I wanted to see if I could get the same results by playing with the > revset optimizer. I have never looked at that code before and I don't > know if it's a good idea. If it is, I'll have to let you do it > yourself since I can't even test it. I looked into this a bit more. I think this performance is due to the nature of our largest repo. We had three large repos which were then merged together, so there are three distinct branches in history. The --profile of this revset is as follows: ~/foo> time hg log -r '::tip and obsolete()' --hidden --profile | 100.0% cmdutil.py: getlogrevs line 4792: revs, expr, filematcher = c... \ 93.3% revset.py: __nonzero__ line 2135: if not revs: | 93.3% revset.py: _iterfilter line 3110: for r in it: | 93.3% revset.py: _desccontains line 3084: if cond(x): | 89.6% revset.py: _consumegen line 3460: for l in self._consumegen(): | 82.2% revset.py: iterate line 3508: for item in self._gen: | 30.4% changelog.py: parentrevs line 56: for parent in cl.parentrevs... | 13.3% revlog.py: parentrevs line 231: return super(changelog, sel... Most of the time is in iterate, which is where the heapq management goes. I think having the three branches in history cause more heap work than without (in fact, if I do "(A::tip | B::tip | C::tip) and obsolete()", where A B and C are roots of the three histories, it's way faster than just ::tip and the iterate disappears from the profile. Either way, I think my fix still applies, since it fixes the O() entirely.
On Sun, Nov 8, 2015 at 8:53 AM Durham Goode <durham@fb.com> wrote: > > > On 11/8/15 8:46 AM, Gregory Szorc wrote: > > On Sun, Nov 8, 2015 at 8:38 AM, Durham Goode <durham@fb.com> wrote: > >> >> >> On 11/8/15 8:28 AM, Gregory Szorc wrote: >> >> On Sat, Nov 7, 2015 at 5:20 PM, Durham Goode < <durham@fb.com> >> durham@fb.com> wrote: >> >>> # HG changeset patch >>> # User Durham Goode < <durham@fb.com>durham@fb.com> >>> # Date 1446945001 28800 >>> # Sat Nov 07 17:10:01 2015 -0800 >>> # Node ID 7c680f209f7af35c7c476eecc2f9eec13b32ad62 >>> # Parent 48547b4c77defdd17c670b1eb0eb94272edf0207 >>> inhibit: improve transaction marker perf >>> >>> The old algorithm was a revset "::X and obsolete()". This was >>> inefficient because >>> it requires walking all the way down the ancestor chain (since the >>> revset did >>> not know it could stop walking at public nodes). >>> >>> The new algorithm uses changelog.ancestors() directly and provides a >>> function to >>> stop the iteration when we reach a public node. During a commit to a >>> repo with >>> hundreds of thousands of commits, this change reduces the inhibitmarkers >>> time >>> from 1.5 seconds to effectively 0 seconds. >>> >>> diff --git a/hgext/inhibit.py b/hgext/inhibit.py >>> --- a/hgext/inhibit.py >>> +++ b/hgext/inhibit.py >>> @@ -23,6 +23,7 @@ from mercurial import scmutil >>> from mercurial import commands >>> from mercurial import lock as lockmod >>> from mercurial import bookmarks >>> +from mercurial import phases >>> from mercurial import util >>> from mercurial.i18n import _ >>> >>> @@ -129,7 +130,15 @@ def _inhibitmarkers(repo, nodes): >>> if not _inhibitenabled(repo): >>> return >>> >>> - newinhibit = repo.set('::%ln and obsolete()', nodes) >>> + newinhibit = set() >>> + revs = [repo[node].rev() for node in nodes] >>> + phase = repo._phasecache.phase >>> + obsoletes = obsolete.getrevs(repo, 'obsolete') >>> + for anc in repo.changelog.ancestors(revs, inclusive=True, >>> + stopfunc=lambda rev: phase(repo, rev) == phases.public): >>> + if anc in obsoletes: >>> + newinhibit.add(repo[anc]) >>> + >>> >>> >> I must be missing something obvious: why can't this be written as: >> >> obsoletes = ... >> for rev in repo.revs('::%ln and not public()'): >> if rev in obsoletes: >> ... >> >> I don't understand why stopfunc is needed and why a revset can't do the >> same thing. >> >> The way "::X and not public()" works is it iterates down the ancestor >> line and checks each commit to see if it's not public. The iterator has no >> knowledge that finding something that is public means it can stop >> iterating, so it will iterate over all the commits. >> >> In theory we could try to make revsets smarter (i.e. by allowing the >> ancestor revset to communicate to the public revset that it's doing a >> descendant traversal, and allowing the public revset to signal a stop), but >> that's rather complex, and the implementation would probably end up adding >> the stopfunc to lazyancestors anyway (since that's how you'd likely >> implement it). >> > > I see. Thanks for explaining. And here I thought all that work in recent > releases to drastically speed up "not public()" implemented things like > this because it had to. > > "not public()" is optimized, so it's quick. But testing a series of > commits against it is still O(size of series). In theory we could optimize > "::X and not public()" to iterate over the "not public()" set instead of > the ancestors, then test against the ancestor set (which is cheaper), but > that's still O(distance from first not-public commit), which is still too > slow in this case. > I think that's actually how it already works. Just time "hg log -T. -r ::@" vs "hg log -T. -r '::@ and not public()'". It seems like a good thing if "::X and obsolete()" could be optimized rather than avoiding use of it, but that's of course much more work. Is it likely enough that we will ever optimize that that we want to keep track of it somehow (maybe "# This works around '::%ln and obsolete()' which is not yet optimized")? I don't know how we usually handle this kind of thing. > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@selenic.com > https://selenic.com/mailman/listinfo/mercurial-devel >
On 11/08/2015 09:24 AM, Durham Goode wrote: > > > On 11/8/15 12:04 AM, Martin von Zweigbergk wrote: >> >> >> On Sat, Nov 7, 2015 at 10:15 PM Durham Goode <durham@fb.com >> <mailto:durham@fb.com>> wrote: >> >> >> >> On 11/7/15 9:52 PM, Martin von Zweigbergk wrote: >>> >>> >>> On Sat, Nov 7, 2015 at 5:21 PM Durham Goode <durham@fb.com >>> <mailto:durham@fb.com>> wrote: >>> >>> # HG changeset patch >>> # User Durham Goode <durham@fb.com <mailto:durham@fb.com>> >>> # Date 1446945001 28800 >>> # Sat Nov 07 17:10:01 2015 -0800 >>> # Node ID 7c680f209f7af35c7c476eecc2f9eec13b32ad62 >>> # Parent 48547b4c77defdd17c670b1eb0eb94272edf0207 >>> inhibit: improve transaction marker perf >>> >>> The old algorithm was a revset "::X and obsolete()". This was >>> inefficient because >>> it requires walking all the way down the ancestor chain >>> (since the revset did >>> not know it could stop walking at public nodes). >>> >>> >>> I was hoping to reproduce the slowness on the Mozilla repo (270k >>> revisions), but "hg log -r '::tip and obsolete()'" runs in 180 >>> ms. Do you have a better command for me to try? How many >>> obsmarkers in the repo you tried it on? >> You need to make sure you pass --hidden, otherwise the obsolete() >> revset resolves to an empty set and tests against it are cheap. >> >> >> I'm still not able to reproduce this :-( Some tests take ~800 ms, but >> most of that time seems to be related to loading obsmarkers and not >> about iterating over the revset. >> >> I wanted to see if I could get the same results by playing with the >> revset optimizer. I have never looked at that code before and I don't >> know if it's a good idea. If it is, I'll have to let you do it >> yourself since I can't even test it. > I looked into this a bit more. I think this performance is due to the > nature of our largest repo. We had three large repos which were then > merged together, so there are three distinct branches in history. The > --profile of this revset is as follows: > > ~/foo> time hg log -r '::tip and obsolete()' --hidden --profile > | 100.0% cmdutil.py: getlogrevs line 4792: revs, > expr, filematcher = c... > \ 93.3% revset.py: __nonzero__ line 2135: if not revs: > | 93.3% revset.py: _iterfilter line 3110: for r in it: > | 93.3% revset.py: _desccontains line 3084: if cond(x): > | 89.6% revset.py: _consumegen line 3460: for l in > self._consumegen(): > | 82.2% revset.py: iterate line 3508: for item > in self._gen: > | 30.4% changelog.py: parentrevs line 56: for parent > in cl.parentrevs... > | 13.3% revlog.py: parentrevs line 231: return > super(changelog, sel... > > Most of the time is in iterate, which is where the heapq management > goes. I think having the three branches in history cause more heap work > than without (in fact, if I do "(A::tip | B::tip | C::tip) and > obsolete()", where A B and C are roots of the three histories, it's way > faster than just ::tip and the iterate disappears from the profile. > > Either way, I think my fix still applies, since it fixes the O() entirely. A::tip is a whole different algorithm than ::tip. It is non lazy and in C. I'm not surprise this boost things up.
On 11/07/2015 05:20 PM, Durham Goode wrote: > # HG changeset patch > # User Durham Goode <durham@fb.com> > # Date 1446945001 28800 > # Sat Nov 07 17:10:01 2015 -0800 > # Node ID 7c680f209f7af35c7c476eecc2f9eec13b32ad62 > # Parent 48547b4c77defdd17c670b1eb0eb94272edf0207 > inhibit: improve transaction marker perf We found a bug in the original code that resulted in inhibit store to be rewritten in all cases. I've pushed a fix. We've also used some dirty trick on the revset to make it acceptably fast.
Patch
diff --git a/hgext/inhibit.py b/hgext/inhibit.py --- a/hgext/inhibit.py +++ b/hgext/inhibit.py @@ -23,6 +23,7 @@ from mercurial import scmutil from mercurial import commands from mercurial import lock as lockmod from mercurial import bookmarks +from mercurial import phases from mercurial import util from mercurial.i18n import _ @@ -129,7 +130,15 @@ def _inhibitmarkers(repo, nodes): if not _inhibitenabled(repo): return - newinhibit = repo.set('::%ln and obsolete()', nodes) + newinhibit = set() + revs = [repo[node].rev() for node in nodes] + phase = repo._phasecache.phase + obsoletes = obsolete.getrevs(repo, 'obsolete') + for anc in repo.changelog.ancestors(revs, inclusive=True, + stopfunc=lambda rev: phase(repo, rev) == phases.public): + if anc in obsoletes: + newinhibit.add(repo[anc]) + if newinhibit: lock = tr = None try: