Patchwork [V2] smartset: add a "toset" method

login
register
mail settings
Submitter Jun Wu
Date June 3, 2017, 9:06 p.m.
Message ID <a5f77662c4f22467b84f.1496523998@x1c>
Download mbox | patch
Permalink /patch/21164/
State Deferred, archived
Headers show

Comments

Jun Wu - June 3, 2017, 9:06 p.m.
# HG changeset patch
# User Jun Wu <quark@fb.com>
# Date 1496515319 25200
#      Sat Jun 03 11:41:59 2017 -0700
# Node ID a5f77662c4f22467b84fc3ce494998d23b0daa82
# Parent  783394c0c97807e83daad9da561179bd0719e159
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r a5f77662c4f2
smartset: add a "toset" method

This allows us to convert a smartset to a Python set using a customized
approach.  Namely, baseset may have a "_set" property already which could be
used directly to avoid __iter__ overhead.
Augie Fackler - June 3, 2017, 11:07 p.m.
> On Jun 3, 2017, at 5:06 PM, Jun Wu <quark@fb.com> wrote:
> 
> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1496515319 25200
> #      Sat Jun 03 11:41:59 2017 -0700
> # Node ID a5f77662c4f22467b84fc3ce494998d23b0daa82
> # Parent  783394c0c97807e83daad9da561179bd0719e159
> # Available At https://bitbucket.org/quark-zju/hg-draft
> #              hg pull https://bitbucket.org/quark-zju/hg-draft -r a5f77662c4f2
> smartset: add a "toset" method
> 
> This allows us to convert a smartset to a Python set using a customized
> approach.  Namely, baseset may have a "_set" property already which could be
> used directly to avoid __iter__ overhead.
> 
> diff --git a/mercurial/smartset.py b/mercurial/smartset.py
> --- a/mercurial/smartset.py
> +++ b/mercurial/smartset.py
> @@ -156,4 +156,8 @@ class abstractsmartset(object):
>         return filteredset(self, condition, condrepr)
> 
> +    def toset(self):
> +        """Convert to unordered Python set"""
> +        return set(self)
> +
> class baseset(abstractsmartset):
>     """Basic data structure that represents a revset and contains the basic
> @@ -169,5 +173,6 @@ class baseset(abstractsmartset):
> 
>     Construct by a set:
> -    >>> xs = baseset(set(x))
> +    >>> xset = set(x)
> +    >>> xs = baseset(xset)
>>>> ys = baseset(set(y))
>>>> [list(i) for i in [xs + ys, xs & ys, xs - ys]]
> @@ -175,4 +180,6 @@ class baseset(abstractsmartset):
>>>> [type(i).__name__ for i in [xs + ys, xs & ys, xs - ys]]
>     ['addset', 'baseset', 'baseset']
> +    >>> xset is xs.toset()
> +    True
> 
>     Construct by a list-like:
> @@ -237,4 +244,7 @@ class baseset(abstractsmartset):
>         return set(self._list)
> 
> +    def toset(self):
> +        return self._set

It freaks me out just a little (maybe too much Rust today?) to leak self._set mutably like this. What do you think of making a copy? Should we just strongly admonish in the “convert to a set” docstring that callers of toset() _must not_ mutate the returned set?

> +
>     @util.propertycache
>     def _asclist(self):
> @@ -584,4 +594,8 @@ class addset(abstractsmartset):
>>>> [x for x in rs]
>     [5, 4, 3, 2, 0]
> +
> +    convert to Python set:
> +    >>> sorted(rs.toset())
> +    [0, 2, 3, 4, 5]
>     """
>     def __init__(self, revs1, revs2, ascending=None):
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Pierre-Yves David - June 3, 2017, 11:29 p.m.
On 06/04/2017 01:07 AM, Augie Fackler wrote:
>
>> On Jun 3, 2017, at 5:06 PM, Jun Wu <quark@fb.com> wrote:
>>
>> # HG changeset patch
>> # User Jun Wu <quark@fb.com>
>> # Date 1496515319 25200
>> #      Sat Jun 03 11:41:59 2017 -0700
>> # Node ID a5f77662c4f22467b84fc3ce494998d23b0daa82
>> # Parent  783394c0c97807e83daad9da561179bd0719e159
>> # Available At https://bitbucket.org/quark-zju/hg-draft
>> #              hg pull https://bitbucket.org/quark-zju/hg-draft -r a5f77662c4f2
>> smartset: add a "toset" method
>>
>> This allows us to convert a smartset to a Python set using a customized
>> approach.  Namely, baseset may have a "_set" property already which could be
>> used directly to avoid __iter__ overhead.
>>
>> diff --git a/mercurial/smartset.py b/mercurial/smartset.py
>> --- a/mercurial/smartset.py
>> +++ b/mercurial/smartset.py
>> @@ -156,4 +156,8 @@ class abstractsmartset(object):
>>         return filteredset(self, condition, condrepr)
>>
>> +    def toset(self):
>> +        """Convert to unordered Python set"""
>> +        return set(self)
>> +
>> class baseset(abstractsmartset):
>>     """Basic data structure that represents a revset and contains the basic
>> @@ -169,5 +173,6 @@ class baseset(abstractsmartset):
>>
>>     Construct by a set:
>> -    >>> xs = baseset(set(x))
>> +    >>> xset = set(x)
>> +    >>> xs = baseset(xset)
>>>>> ys = baseset(set(y))
>>>>> [list(i) for i in [xs + ys, xs & ys, xs - ys]]
>> @@ -175,4 +180,6 @@ class baseset(abstractsmartset):
>>>>> [type(i).__name__ for i in [xs + ys, xs & ys, xs - ys]]
>>     ['addset', 'baseset', 'baseset']
>> +    >>> xset is xs.toset()
>> +    True
>>
>>     Construct by a list-like:
>> @@ -237,4 +244,7 @@ class baseset(abstractsmartset):
>>         return set(self._list)
>>
>> +    def toset(self):
>> +        return self._set
>
> It freaks me out just a little (maybe too much Rust today?) to leak self._set mutably like this. What do you think of making a copy? Should we just strongly admonish in the “convert to a set” docstring that callers of toset() _must not_ mutate the returned set?

Note: For the target code, copying sets is can be multiple percent of 
the total run time.
Augie Fackler - June 4, 2017, 12:02 a.m.
> On Jun 3, 2017, at 7:29 PM, Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote:
> 
>>> +    def toset(self):
>>> +        return self._set
>> 
>> It freaks me out just a little (maybe too much Rust today?) to leak self._set mutably like this. What do you think of making a copy? Should we just strongly admonish in the “convert to a set” docstring that callers of toset() _must not_ mutate the returned set?
> 
> Note: For the target code, copying sets is can be multiple percent of the total run time.

Okay, it sounds like we should document that the API contract is that clients must not mutate the returned set. Does that sound workable to everyone?
via Mercurial-devel - June 4, 2017, 12:55 a.m.
On Jun 3, 2017 5:02 PM, "Augie Fackler" <raf@durin42.com> wrote:


> On Jun 3, 2017, at 7:29 PM, Pierre-Yves David <pierre-yves.david@ens-lyon.
org> wrote:
>
>>> +    def toset(self):
>>> +        return self._set
>>
>> It freaks me out just a little (maybe too much Rust today?) to leak
self._set mutably like this. What do you think of making a copy? Should we
just strongly admonish in the “convert to a set” docstring that callers of
toset() _must not_ mutate the returned set?
>
> Note: For the target code, copying sets is can be multiple percent of the
total run time.

Okay, it sounds like we should document that the API contract is that
clients must not mutate the returned set. Does that sound workable to
everyone?


Sounds good to me.

But I think the argument about "multiple percent of the total run time" is
pretty silly when the total run time is about a millisecond (IIUC).
Pierre-Yves David - June 4, 2017, 1:28 a.m.
On 06/04/2017 02:55 AM, Martin von Zweigbergk wrote:
>
>
> On Jun 3, 2017 5:02 PM, "Augie Fackler" <raf@durin42.com
> <mailto:raf@durin42.com>> wrote:
>
>
>     > On Jun 3, 2017, at 7:29 PM, Pierre-Yves David
>     <pierre-yves.david@ens-lyon.org
>     <mailto:pierre-yves.david@ens-lyon.org>> wrote:
>     >
>     >>> +    def toset(self):
>     >>> +        return self._set
>     >>
>     >> It freaks me out just a little (maybe too much Rust today?) to
>     leak self._set mutably like this. What do you think of making a
>     copy? Should we just strongly admonish in the “convert to a set”
>     docstring that callers of toset() _must not_ mutate the returned set?
>     >
>     > Note: For the target code, copying sets is can be multiple percent
>     of the total run time.
>
>     Okay, it sounds like we should document that the API contract is
>     that clients must not mutate the returned set. Does that sound
>     workable to everyone?

That works for me (ideally, python would have an API to turn a set into 
a frozen set in place).

Regarding this series, the new function currently do not have any user 
so I think we should wait a bit. Updating the whole API of smartset for 
something that we are not sure to use yet seems premature.

I think Jun made his point in terms of "what he would like the API to 
looks like." and we should get back to that as part of a series actually 
using it (so that we can discuss the benefit).

> Sounds good to me.
>
> But I think the argument about "multiple percent of the total run time"
> is pretty silly when the total run time is about a millisecond (IIUC).

I agree it is not too critical
Yuya Nishihara - June 4, 2017, 2:41 a.m.
On Sat, 3 Jun 2017 20:02:28 -0400, Augie Fackler wrote:
> 
> > On Jun 3, 2017, at 7:29 PM, Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote:
> > 
> >>> +    def toset(self):
> >>> +        return self._set
> >> 
> >> It freaks me out just a little (maybe too much Rust today?) to leak self._set mutably like this. What do you think of making a copy? Should we just strongly admonish in the “convert to a set” docstring that callers of toset() _must not_ mutate the returned set?
> > 
> > Note: For the target code, copying sets is can be multiple percent of the total run time.
> 
> Okay, it sounds like we should document that the API contract is that clients must not mutate the returned set. Does that sound workable to everyone?

Can't we change self._set to frozenset? smartset internals are cache heavy,
mutating it would lead to subtle bugs.
Jun Wu - June 5, 2017, 12:36 a.m.
Excerpts from Yuya Nishihara's message of 2017-06-04 11:41:26 +0900:
> On Sat, 3 Jun 2017 20:02:28 -0400, Augie Fackler wrote:
> > Okay, it sounds like we should document that the API contract is that
> > clients must not mutate the returned set. Does that sound workable to
> > everyone?
> 
> Can't we change self._set to frozenset? smartset internals are cache heavy,
> mutating it would lead to subtle bugs.

I think Python is fundamentally unfriendly for marking data as immutable:

  - everything is mutable by default (unlike Rust)
  - set -> frozenset, list -> tuple are at least O(N)

It feels like if "const" in C++ is not just a compiler hint but has runtime
penalty, would you still use "const"?

I guess my answer is no. Even if we fix baseset, it's likely that we still
have many places need to be fixed, like obsstore._all.

Therefore I prefer the documentation way to "fix" this issue.
Yuya Nishihara - June 5, 2017, 1:16 p.m.
On Sun, 4 Jun 2017 17:36:04 -0700, Jun Wu wrote:
> Excerpts from Yuya Nishihara's message of 2017-06-04 11:41:26 +0900:
> > On Sat, 3 Jun 2017 20:02:28 -0400, Augie Fackler wrote:
> > > Okay, it sounds like we should document that the API contract is that
> > > clients must not mutate the returned set. Does that sound workable to
> > > everyone?
> > 
> > Can't we change self._set to frozenset? smartset internals are cache heavy,
> > mutating it would lead to subtle bugs.
> 
> I think Python is fundamentally unfriendly for marking data as immutable:
> 
>   - everything is mutable by default (unlike Rust)
>   - set -> frozenset, list -> tuple are at least O(N)
> 
> It feels like if "const" in C++ is not just a compiler hint but has runtime
> penalty, would you still use "const"?
> 
> I guess my answer is no. Even if we fix baseset, it's likely that we still
> have many places need to be fixed, like obsstore._all.
> 
> Therefore I prefer the documentation way to "fix" this issue.

Okay. Then, my two cents is naming the function with underscore-prefix
(e.g. _asset()) to trigger extra attention.
via Mercurial-devel - June 5, 2017, 2:45 p.m.
On Jun 5, 2017 6:22 AM, "Yuya Nishihara" <yuya@tcha.org> wrote:

On Sun, 4 Jun 2017 17:36:04 -0700, Jun Wu wrote:
> Excerpts from Yuya Nishihara's message of 2017-06-04 11:41:26 +0900:
> > On Sat, 3 Jun 2017 20:02:28 -0400, Augie Fackler wrote:
> > > Okay, it sounds like we should document that the API contract is that
> > > clients must not mutate the returned set. Does that sound workable to
> > > everyone?
> >
> > Can't we change self._set to frozenset? smartset internals are cache
heavy,
> > mutating it would lead to subtle bugs.
>
> I think Python is fundamentally unfriendly for marking data as immutable:
>
>   - everything is mutable by default (unlike Rust)
>   - set -> frozenset, list -> tuple are at least O(N)
>
> It feels like if "const" in C++ is not just a compiler hint but has
runtime
> penalty, would you still use "const"?
>
> I guess my answer is no. Even if we fix baseset, it's likely that we still
> have many places need to be fixed, like obsstore._all.
>
> Therefore I prefer the documentation way to "fix" this issue.

Okay. Then, my two cents is naming the function with underscore-prefix
(e.g. _asset()) to trigger extra attention.


I agree with Yuya. However, given our lack of underscores, "asset" is not
the best name.

And please add at least one call site if you resend.

Patch

diff --git a/mercurial/smartset.py b/mercurial/smartset.py
--- a/mercurial/smartset.py
+++ b/mercurial/smartset.py
@@ -156,4 +156,8 @@  class abstractsmartset(object):
         return filteredset(self, condition, condrepr)
 
+    def toset(self):
+        """Convert to unordered Python set"""
+        return set(self)
+
 class baseset(abstractsmartset):
     """Basic data structure that represents a revset and contains the basic
@@ -169,5 +173,6 @@  class baseset(abstractsmartset):
 
     Construct by a set:
-    >>> xs = baseset(set(x))
+    >>> xset = set(x)
+    >>> xs = baseset(xset)
     >>> ys = baseset(set(y))
     >>> [list(i) for i in [xs + ys, xs & ys, xs - ys]]
@@ -175,4 +180,6 @@  class baseset(abstractsmartset):
     >>> [type(i).__name__ for i in [xs + ys, xs & ys, xs - ys]]
     ['addset', 'baseset', 'baseset']
+    >>> xset is xs.toset()
+    True
 
     Construct by a list-like:
@@ -237,4 +244,7 @@  class baseset(abstractsmartset):
         return set(self._list)
 
+    def toset(self):
+        return self._set
+
     @util.propertycache
     def _asclist(self):
@@ -584,4 +594,8 @@  class addset(abstractsmartset):
     >>> [x for x in rs]
     [5, 4, 3, 2, 0]
+
+    convert to Python set:
+    >>> sorted(rs.toset())
+    [0, 2, 3, 4, 5]
     """
     def __init__(self, revs1, revs2, ascending=None):