Submitter | Jun Wu |
---|---|
Date | June 3, 2017, 9:06 p.m. |
Message ID | <a5f77662c4f22467b84f.1496523998@x1c> |
Download | mbox | patch |
Permalink | /patch/21164/ |
State | Deferred, archived |
Headers | show |
Comments
> On Jun 3, 2017, at 5:06 PM, Jun Wu <quark@fb.com> wrote: > > # HG changeset patch > # User Jun Wu <quark@fb.com> > # Date 1496515319 25200 > # Sat Jun 03 11:41:59 2017 -0700 > # Node ID a5f77662c4f22467b84fc3ce494998d23b0daa82 > # Parent 783394c0c97807e83daad9da561179bd0719e159 > # Available At https://bitbucket.org/quark-zju/hg-draft > # hg pull https://bitbucket.org/quark-zju/hg-draft -r a5f77662c4f2 > smartset: add a "toset" method > > This allows us to convert a smartset to a Python set using a customized > approach. Namely, baseset may have a "_set" property already which could be > used directly to avoid __iter__ overhead. > > diff --git a/mercurial/smartset.py b/mercurial/smartset.py > --- a/mercurial/smartset.py > +++ b/mercurial/smartset.py > @@ -156,4 +156,8 @@ class abstractsmartset(object): > return filteredset(self, condition, condrepr) > > + def toset(self): > + """Convert to unordered Python set""" > + return set(self) > + > class baseset(abstractsmartset): > """Basic data structure that represents a revset and contains the basic > @@ -169,5 +173,6 @@ class baseset(abstractsmartset): > > Construct by a set: > - >>> xs = baseset(set(x)) > + >>> xset = set(x) > + >>> xs = baseset(xset) >>>> ys = baseset(set(y)) >>>> [list(i) for i in [xs + ys, xs & ys, xs - ys]] > @@ -175,4 +180,6 @@ class baseset(abstractsmartset): >>>> [type(i).__name__ for i in [xs + ys, xs & ys, xs - ys]] > ['addset', 'baseset', 'baseset'] > + >>> xset is xs.toset() > + True > > Construct by a list-like: > @@ -237,4 +244,7 @@ class baseset(abstractsmartset): > return set(self._list) > > + def toset(self): > + return self._set It freaks me out just a little (maybe too much Rust today?) to leak self._set mutably like this. What do you think of making a copy? Should we just strongly admonish in the “convert to a set” docstring that callers of toset() _must not_ mutate the returned set? > + > @util.propertycache > def _asclist(self): > @@ -584,4 +594,8 @@ class addset(abstractsmartset): >>>> [x for x in rs] > [5, 4, 3, 2, 0] > + > + convert to Python set: > + >>> sorted(rs.toset()) > + [0, 2, 3, 4, 5] > """ > def __init__(self, revs1, revs2, ascending=None): > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
On 06/04/2017 01:07 AM, Augie Fackler wrote: > >> On Jun 3, 2017, at 5:06 PM, Jun Wu <quark@fb.com> wrote: >> >> # HG changeset patch >> # User Jun Wu <quark@fb.com> >> # Date 1496515319 25200 >> # Sat Jun 03 11:41:59 2017 -0700 >> # Node ID a5f77662c4f22467b84fc3ce494998d23b0daa82 >> # Parent 783394c0c97807e83daad9da561179bd0719e159 >> # Available At https://bitbucket.org/quark-zju/hg-draft >> # hg pull https://bitbucket.org/quark-zju/hg-draft -r a5f77662c4f2 >> smartset: add a "toset" method >> >> This allows us to convert a smartset to a Python set using a customized >> approach. Namely, baseset may have a "_set" property already which could be >> used directly to avoid __iter__ overhead. >> >> diff --git a/mercurial/smartset.py b/mercurial/smartset.py >> --- a/mercurial/smartset.py >> +++ b/mercurial/smartset.py >> @@ -156,4 +156,8 @@ class abstractsmartset(object): >> return filteredset(self, condition, condrepr) >> >> + def toset(self): >> + """Convert to unordered Python set""" >> + return set(self) >> + >> class baseset(abstractsmartset): >> """Basic data structure that represents a revset and contains the basic >> @@ -169,5 +173,6 @@ class baseset(abstractsmartset): >> >> Construct by a set: >> - >>> xs = baseset(set(x)) >> + >>> xset = set(x) >> + >>> xs = baseset(xset) >>>>> ys = baseset(set(y)) >>>>> [list(i) for i in [xs + ys, xs & ys, xs - ys]] >> @@ -175,4 +180,6 @@ class baseset(abstractsmartset): >>>>> [type(i).__name__ for i in [xs + ys, xs & ys, xs - ys]] >> ['addset', 'baseset', 'baseset'] >> + >>> xset is xs.toset() >> + True >> >> Construct by a list-like: >> @@ -237,4 +244,7 @@ class baseset(abstractsmartset): >> return set(self._list) >> >> + def toset(self): >> + return self._set > > It freaks me out just a little (maybe too much Rust today?) to leak self._set mutably like this. What do you think of making a copy? Should we just strongly admonish in the “convert to a set” docstring that callers of toset() _must not_ mutate the returned set? Note: For the target code, copying sets is can be multiple percent of the total run time.
> On Jun 3, 2017, at 7:29 PM, Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote: > >>> + def toset(self): >>> + return self._set >> >> It freaks me out just a little (maybe too much Rust today?) to leak self._set mutably like this. What do you think of making a copy? Should we just strongly admonish in the “convert to a set” docstring that callers of toset() _must not_ mutate the returned set? > > Note: For the target code, copying sets is can be multiple percent of the total run time. Okay, it sounds like we should document that the API contract is that clients must not mutate the returned set. Does that sound workable to everyone?
On Jun 3, 2017 5:02 PM, "Augie Fackler" <raf@durin42.com> wrote: > On Jun 3, 2017, at 7:29 PM, Pierre-Yves David <pierre-yves.david@ens-lyon. org> wrote: > >>> + def toset(self): >>> + return self._set >> >> It freaks me out just a little (maybe too much Rust today?) to leak self._set mutably like this. What do you think of making a copy? Should we just strongly admonish in the “convert to a set” docstring that callers of toset() _must not_ mutate the returned set? > > Note: For the target code, copying sets is can be multiple percent of the total run time. Okay, it sounds like we should document that the API contract is that clients must not mutate the returned set. Does that sound workable to everyone? Sounds good to me. But I think the argument about "multiple percent of the total run time" is pretty silly when the total run time is about a millisecond (IIUC).
On 06/04/2017 02:55 AM, Martin von Zweigbergk wrote: > > > On Jun 3, 2017 5:02 PM, "Augie Fackler" <raf@durin42.com > <mailto:raf@durin42.com>> wrote: > > > > On Jun 3, 2017, at 7:29 PM, Pierre-Yves David > <pierre-yves.david@ens-lyon.org > <mailto:pierre-yves.david@ens-lyon.org>> wrote: > > > >>> + def toset(self): > >>> + return self._set > >> > >> It freaks me out just a little (maybe too much Rust today?) to > leak self._set mutably like this. What do you think of making a > copy? Should we just strongly admonish in the “convert to a set” > docstring that callers of toset() _must not_ mutate the returned set? > > > > Note: For the target code, copying sets is can be multiple percent > of the total run time. > > Okay, it sounds like we should document that the API contract is > that clients must not mutate the returned set. Does that sound > workable to everyone? That works for me (ideally, python would have an API to turn a set into a frozen set in place). Regarding this series, the new function currently do not have any user so I think we should wait a bit. Updating the whole API of smartset for something that we are not sure to use yet seems premature. I think Jun made his point in terms of "what he would like the API to looks like." and we should get back to that as part of a series actually using it (so that we can discuss the benefit). > Sounds good to me. > > But I think the argument about "multiple percent of the total run time" > is pretty silly when the total run time is about a millisecond (IIUC). I agree it is not too critical
On Sat, 3 Jun 2017 20:02:28 -0400, Augie Fackler wrote: > > > On Jun 3, 2017, at 7:29 PM, Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote: > > > >>> + def toset(self): > >>> + return self._set > >> > >> It freaks me out just a little (maybe too much Rust today?) to leak self._set mutably like this. What do you think of making a copy? Should we just strongly admonish in the “convert to a set” docstring that callers of toset() _must not_ mutate the returned set? > > > > Note: For the target code, copying sets is can be multiple percent of the total run time. > > Okay, it sounds like we should document that the API contract is that clients must not mutate the returned set. Does that sound workable to everyone? Can't we change self._set to frozenset? smartset internals are cache heavy, mutating it would lead to subtle bugs.
Excerpts from Yuya Nishihara's message of 2017-06-04 11:41:26 +0900: > On Sat, 3 Jun 2017 20:02:28 -0400, Augie Fackler wrote: > > Okay, it sounds like we should document that the API contract is that > > clients must not mutate the returned set. Does that sound workable to > > everyone? > > Can't we change self._set to frozenset? smartset internals are cache heavy, > mutating it would lead to subtle bugs. I think Python is fundamentally unfriendly for marking data as immutable: - everything is mutable by default (unlike Rust) - set -> frozenset, list -> tuple are at least O(N) It feels like if "const" in C++ is not just a compiler hint but has runtime penalty, would you still use "const"? I guess my answer is no. Even if we fix baseset, it's likely that we still have many places need to be fixed, like obsstore._all. Therefore I prefer the documentation way to "fix" this issue.
On Sun, 4 Jun 2017 17:36:04 -0700, Jun Wu wrote: > Excerpts from Yuya Nishihara's message of 2017-06-04 11:41:26 +0900: > > On Sat, 3 Jun 2017 20:02:28 -0400, Augie Fackler wrote: > > > Okay, it sounds like we should document that the API contract is that > > > clients must not mutate the returned set. Does that sound workable to > > > everyone? > > > > Can't we change self._set to frozenset? smartset internals are cache heavy, > > mutating it would lead to subtle bugs. > > I think Python is fundamentally unfriendly for marking data as immutable: > > - everything is mutable by default (unlike Rust) > - set -> frozenset, list -> tuple are at least O(N) > > It feels like if "const" in C++ is not just a compiler hint but has runtime > penalty, would you still use "const"? > > I guess my answer is no. Even if we fix baseset, it's likely that we still > have many places need to be fixed, like obsstore._all. > > Therefore I prefer the documentation way to "fix" this issue. Okay. Then, my two cents is naming the function with underscore-prefix (e.g. _asset()) to trigger extra attention.
On Jun 5, 2017 6:22 AM, "Yuya Nishihara" <yuya@tcha.org> wrote: On Sun, 4 Jun 2017 17:36:04 -0700, Jun Wu wrote: > Excerpts from Yuya Nishihara's message of 2017-06-04 11:41:26 +0900: > > On Sat, 3 Jun 2017 20:02:28 -0400, Augie Fackler wrote: > > > Okay, it sounds like we should document that the API contract is that > > > clients must not mutate the returned set. Does that sound workable to > > > everyone? > > > > Can't we change self._set to frozenset? smartset internals are cache heavy, > > mutating it would lead to subtle bugs. > > I think Python is fundamentally unfriendly for marking data as immutable: > > - everything is mutable by default (unlike Rust) > - set -> frozenset, list -> tuple are at least O(N) > > It feels like if "const" in C++ is not just a compiler hint but has runtime > penalty, would you still use "const"? > > I guess my answer is no. Even if we fix baseset, it's likely that we still > have many places need to be fixed, like obsstore._all. > > Therefore I prefer the documentation way to "fix" this issue. Okay. Then, my two cents is naming the function with underscore-prefix (e.g. _asset()) to trigger extra attention. I agree with Yuya. However, given our lack of underscores, "asset" is not the best name. And please add at least one call site if you resend.
Patch
diff --git a/mercurial/smartset.py b/mercurial/smartset.py --- a/mercurial/smartset.py +++ b/mercurial/smartset.py @@ -156,4 +156,8 @@ class abstractsmartset(object): return filteredset(self, condition, condrepr) + def toset(self): + """Convert to unordered Python set""" + return set(self) + class baseset(abstractsmartset): """Basic data structure that represents a revset and contains the basic @@ -169,5 +173,6 @@ class baseset(abstractsmartset): Construct by a set: - >>> xs = baseset(set(x)) + >>> xset = set(x) + >>> xs = baseset(xset) >>> ys = baseset(set(y)) >>> [list(i) for i in [xs + ys, xs & ys, xs - ys]] @@ -175,4 +180,6 @@ class baseset(abstractsmartset): >>> [type(i).__name__ for i in [xs + ys, xs & ys, xs - ys]] ['addset', 'baseset', 'baseset'] + >>> xset is xs.toset() + True Construct by a list-like: @@ -237,4 +244,7 @@ class baseset(abstractsmartset): return set(self._list) + def toset(self): + return self._set + @util.propertycache def _asclist(self): @@ -584,4 +594,8 @@ class addset(abstractsmartset): >>> [x for x in rs] [5, 4, 3, 2, 0] + + convert to Python set: + >>> sorted(rs.toset()) + [0, 2, 3, 4, 5] """ def __init__(self, revs1, revs2, ascending=None):