Submitter | Yuya Nishihara |
---|---|
Date | Feb. 4, 2015, 1:27 p.m. |
Message ID | <f65a44d4905a5b4767e4.1423056423@mimosa> |
Download | mbox | patch |
Permalink | /patch/7657/ |
State | Accepted |
Headers | show |
Comments
On 02/04/2015 01:27 PM, Yuya Nishihara wrote: > # HG changeset patch > # User Yuya Nishihara <yuya@tcha.org> > # Date 1423052757 -32400 > # Wed Feb 04 21:25:57 2015 +0900 > # Node ID f65a44d4905a5b4767e4b901de6ed1efe703611f > # Parent e1dbe0b215ae137eec53ceb12440536d570a83d2 > revlog: add __contains__ for fast membership test > > Because revlog implements __iter__, "rev in revlog" works but does silly O(n) > lookup unexpectedly. So it seems good to add fast version of __contains__. > > This allows "rev in repo.changelog" in the next patch. Pushed to the clowncopter with a couple of follow up idea. > > diff --git a/mercurial/changelog.py b/mercurial/changelog.py > --- a/mercurial/changelog.py > +++ b/mercurial/changelog.py > @@ -143,6 +143,11 @@ class changelog(revlog.revlog): > if i not in self.filteredrevs: > return self.node(i) > > + def __contains__(self, rev): > + """filtered version of revlog.__contains__""" > + return (revlog.revlog.__contains__(self, rev) > + and rev not in self.filteredrevs) I think there is some magical optimisation to do here, like hard coding the __contains__ content and maybe test for the existence of any filtered revision before checking for members
diff --git a/mercurial/changelog.py b/mercurial/changelog.py > --- a/mercurial/changelog.py > +++ b/mercurial/changelog.py > @@ -143,6 +143,11 @@ class changelog(revlog.revlog): > if i not in self.filteredrevs: > return self.node(i) > > + def __contains__(self, rev): > + """filtered version of revlog.__contains__""" > + return (revlog.revlog.__contains__(self, rev) > + and rev not in self.filteredrevs) > + > What you've implemented here is changelog.hasunfilteredrev(). I'm not convinced __contains__ should be implemented: a) to only accept numeric revisions (what about nodes) b) to only operate on unfiltered revs "a" applies to revlog.__contains__ as well.
On Wed, 04 Feb 2015 16:21:05 +0000, Pierre-Yves David wrote: > On 02/04/2015 01:27 PM, Yuya Nishihara wrote: > > + def __contains__(self, rev): > > + """filtered version of revlog.__contains__""" > > + return (revlog.revlog.__contains__(self, rev) > > + and rev not in self.filteredrevs) > > I think there is some magical optimisation to do here, like hard coding > the __contains__ content and maybe test for the existence of any > filtered revision before checking for members I thought about it, but I didn't. There's no hot loop calling __contains__, so I can't say it has benefit now. Regards,
On Wed, 4 Feb 2015 08:29:08 -0800, Gregory Szorc wrote: > diff --git a/mercurial/changelog.py b/mercurial/changelog.py > > --- a/mercurial/changelog.py > > +++ b/mercurial/changelog.py > > @@ -143,6 +143,11 @@ class changelog(revlog.revlog): > > if i not in self.filteredrevs: > > return self.node(i) > > > > + def __contains__(self, rev): > > + """filtered version of revlog.__contains__""" > > + return (revlog.revlog.__contains__(self, rev) > > + and rev not in self.filteredrevs) > > + > > > > What you've implemented here is changelog.hasunfilteredrev(). I'm not > convinced __contains__ should be implemented: > > a) to only accept numeric revisions (what about nodes) > b) to only operate on unfiltered revs > > "a" applies to revlog.__contains__ as well. What about hasrev(rev) because there's hasnode(node) ? Anyway, I think __contains__ should be implemented to ban use of slow "x in repo.changelog". Regards,
On 02/04/2015 04:29 PM, Gregory Szorc wrote: > diff --git a/mercurial/changelog.py b/mercurial/changelog.py > > --- a/mercurial/changelog.py > +++ b/mercurial/changelog.py > @@ -143,6 +143,11 @@ class changelog(revlog.revlog): > if i not in self.filteredrevs: > return self.node(i) > > + def __contains__(self, rev): > + """filtered version of revlog.__contains__""" > + return (revlog.revlog.__contains__(self, rev) > + and rev not in self.filteredrevs) > + > > > What you've implemented here is changelog.hasunfilteredrev(). I'm not > convinced __contains__ should be implemented: __contains__ should be implemented because otherwise, "rev in changelog" is still valid but ultra slow (uses __iter__). > a) to only accept numeric revisions (what about nodes) The current semantic accepts revs only and I'm fine with that. > b) to only operate on unfiltered revs No it does not? It looks at `self.filteredrevs`.
On 02/05/2015 12:03 PM, Yuya Nishihara wrote: > On Wed, 04 Feb 2015 16:21:05 +0000, Pierre-Yves David wrote: >> On 02/04/2015 01:27 PM, Yuya Nishihara wrote: >>> + def __contains__(self, rev): >>> + """filtered version of revlog.__contains__""" >>> + return (revlog.revlog.__contains__(self, rev) >>> + and rev not in self.filteredrevs) >> >> I think there is some magical optimisation to do here, like hard coding >> the __contains__ content and maybe test for the existence of any >> filtered revision before checking for members > > I thought about it, but I didn't. There's no hot loop calling __contains__, > so I can't say it has benefit now. I think that hardlinking the revlog.__contains__ is simple and a pure win. The rest can probably wait for real usecase.
Patch
diff --git a/mercurial/changelog.py b/mercurial/changelog.py --- a/mercurial/changelog.py +++ b/mercurial/changelog.py @@ -143,6 +143,11 @@ class changelog(revlog.revlog): if i not in self.filteredrevs: return self.node(i) + def __contains__(self, rev): + """filtered version of revlog.__contains__""" + return (revlog.revlog.__contains__(self, rev) + and rev not in self.filteredrevs) + def __iter__(self): """filtered version of revlog.__iter__""" if len(self.filteredrevs) == 0: diff --git a/mercurial/revlog.py b/mercurial/revlog.py --- a/mercurial/revlog.py +++ b/mercurial/revlog.py @@ -277,6 +277,8 @@ class revlog(object): def tip(self): return self.node(len(self.index) - 2) + def __contains__(self, rev): + return 0 <= rev < len(self) def __len__(self): return len(self.index) - 1 def __iter__(self):