Patchwork smartset: add a "toset" method

login
register
mail settings
Submitter Jun Wu
Date June 3, 2017, 7:03 p.m.
Message ID <649ebbdd8944d2a7d8a9.1496516601@x1c>
Download mbox | patch
Permalink /patch/21163/
State Superseded
Headers show

Comments

Jun Wu - June 3, 2017, 7:03 p.m.
# HG changeset patch
# User Jun Wu <quark@fb.com>
# Date 1496515319 25200
#      Sat Jun 03 11:41:59 2017 -0700
# Node ID 649ebbdd8944d2a7d8a9bb5e938886e7c26de1a2
# Parent  783394c0c97807e83daad9da561179bd0719e159
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r 649ebbdd8944
smartset: add a "toset" method

This allows us to convert a smartset to a Python set using a customized
approach.  Namely, baseset may have a "_set" property already which could be
used directly to avoid __iter__ overhead.
Augie Fackler - June 3, 2017, 8:25 p.m.
On Sat, Jun 03, 2017 at 12:03:21PM -0700, Jun Wu wrote:
> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1496515319 25200
> #      Sat Jun 03 11:41:59 2017 -0700
> # Node ID 649ebbdd8944d2a7d8a9bb5e938886e7c26de1a2
> # Parent  783394c0c97807e83daad9da561179bd0719e159
> # Available At https://bitbucket.org/quark-zju/hg-draft
> #              hg pull https://bitbucket.org/quark-zju/hg-draft -r 649ebbdd8944
> smartset: add a "toset" method
>
> This allows us to convert a smartset to a Python set using a customized
> approach.  Namely, baseset may have a "_set" property already which could be
> used directly to avoid __iter__ overhead.
>
> diff --git a/mercurial/smartset.py b/mercurial/smartset.py
> --- a/mercurial/smartset.py
> +++ b/mercurial/smartset.py
> @@ -156,4 +156,10 @@ class abstractsmartset(object):
>          return filteredset(self, condition, condrepr)
>
> +    def toset(self):
> +        """Convert to unordered Python set"""
> +        if not util.safehasattr(self, '_set'):
> +            self._set = set(self)
> +        return self._set

I like where you're headed with this, but I think I'd rather have
classes with an efficient way to produce a set implement toset
themselves and leave the default implementation as set(self) so we can
avoid the spooky action at a distance implied by this hasattr().

> +
>  class baseset(abstractsmartset):
>      """Basic data structure that represents a revset and contains the basic
> @@ -169,5 +175,6 @@ class baseset(abstractsmartset):
>
>      Construct by a set:
> -    >>> xs = baseset(set(x))
> +    >>> xset = set(x)
> +    >>> xs = baseset(xset)
>      >>> ys = baseset(set(y))
>      >>> [list(i) for i in [xs + ys, xs & ys, xs - ys]]
> @@ -175,4 +182,6 @@ class baseset(abstractsmartset):
>      >>> [type(i).__name__ for i in [xs + ys, xs & ys, xs - ys]]
>      ['addset', 'baseset', 'baseset']
> +    >>> xset is xs.toset()
> +    True
>
>      Construct by a list-like:
> @@ -584,4 +593,10 @@ class addset(abstractsmartset):
>      >>> [x for x in rs]
>      [5, 4, 3, 2, 0]
> +
> +    convert to Python set:
> +    >>> rs.toset() is rs.toset()
> +    True
> +    >>> sorted(rs.toset())
> +    [0, 2, 3, 4, 5]
>      """
>      def __init__(self, revs1, revs2, ascending=None):
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Jun Wu - June 3, 2017, 8:54 p.m.
Excerpts from Augie Fackler's message of 2017-06-03 16:25:52 -0400:
> I like where you're headed with this, but I think I'd rather have
> classes with an efficient way to produce a set implement toset
> themselves and leave the default implementation as set(self) so we can
> avoid the spooky action at a distance implied by this hasattr().

The problem is the lack of ability to customize "set(smartset)" logic in the
Python language. "set" will always treat "smartset" as an iterable, which
makes the operation O(N log N) even if we have O(1) fast path.

Between syntactic simplicity and ability to have fast code path, I want to
choose the latter.

For "hasattr", do you mean we might need to do hasattr(x, 'toset')? That
looks like a sign of poorly designed APIs (sometimes return a smartset,
sometimes return a set, so you never know what "x" is). It shouldn't happen
in a clean codebase.

> [...]
Jun Wu - June 3, 2017, 8:58 p.m.
Excerpts from Jun Wu's message of 2017-06-03 13:54:18 -0700:
> Excerpts from Augie Fackler's message of 2017-06-03 16:25:52 -0400:
> > I like where you're headed with this, but I think I'd rather have
> > classes with an efficient way to produce a set implement toset
> > themselves and leave the default implementation as set(self) so we can
> > avoid the spooky action at a distance implied by this hasattr().
> 
> The problem is the lack of ability to customize "set(smartset)" logic in the
> Python language. "set" will always treat "smartset" as an iterable, which
> makes the operation O(N log N) even if we have O(1) fast path.
> 
> Between syntactic simplicity and ability to have fast code path, I want to
> choose the latter.
> 
> For "hasattr", do you mean we might need to do hasattr(x, 'toset')? That
> looks like a sign of poorly designed APIs (sometimes return a smartset,
> sometimes return a set, so you never know what "x" is). It shouldn't happen
> in a clean codebase.

Sorry. I misunderstood the whole paragraph. I'll move "toset" from
"abstractsmartset" to subclasses.

Good advice by the way!
Augie Fackler - June 3, 2017, 10:35 p.m.
> On Jun 3, 2017, at 4:58 PM, Jun Wu <quark@fb.com> wrote:
> 
> Excerpts from Jun Wu's message of 2017-06-03 13:54:18 -0700:
>> Excerpts from Augie Fackler's message of 2017-06-03 16:25:52 -0400:
>>> I like where you're headed with this, but I think I'd rather have
>>> classes with an efficient way to produce a set implement toset
>>> themselves and leave the default implementation as set(self) so we can
>>> avoid the spooky action at a distance implied by this hasattr().
>> 
>> The problem is the lack of ability to customize "set(smartset)" logic in the
>> Python language. "set" will always treat "smartset" as an iterable, which
>> makes the operation O(N log N) even if we have O(1) fast path.
>> 
>> Between syntactic simplicity and ability to have fast code path, I want to
>> choose the latter.
>> 
>> For "hasattr", do you mean we might need to do hasattr(x, 'toset')? That
>> looks like a sign of poorly designed APIs (sometimes return a smartset,
>> sometimes return a set, so you never know what "x" is). It shouldn't happen
>> in a clean codebase.
> 
> Sorry. I misunderstood the whole paragraph. I'll move "toset" from
> "abstractsmartset" to subclasses.

You should probably still have a toset() on abstractsmartset if you intend for it to be a mandatory part of the interface (which I recommend, per Liskov), but have it either do set(self) or (probably better) raise NotImplementedError.

> 
> Good advice by the way!

Thanks! Glad you like it. :)
via Mercurial-devel - June 3, 2017, 10:48 p.m.
On Jun 3, 2017 3:36 PM, "Augie Fackler" <raf@durin42.com> wrote:


On Jun 3, 2017, at 4:58 PM, Jun Wu <quark@fb.com> wrote:

Excerpts from Jun Wu's message of 2017-06-03 13:54:18 -0700:

Excerpts from Augie Fackler's message of 2017-06-03 16:25:52 -0400:

I like where you're headed with this,

Where is Jun headed? It'd be nice to see the user of this new method and
see the it helps there (in practice). The change is simple and makes sense,
but it's still dead code at this point.

but I think I'd rather have
classes with an efficient way to produce a set implement toset
themselves and leave the default implementation as set(self) so we can
avoid the spooky action at a distance implied by this hasattr().


The problem is the lack of ability to customize "set(smartset)" logic in the
Python language. "set" will always treat "smartset" as an iterable, which
makes the operation O(N log N) even if we have O(1) fast path.

Between syntactic simplicity and ability to have fast code path, I want to
choose the latter.

For "hasattr", do you mean we might need to do hasattr(x, 'toset')? That
looks like a sign of poorly designed APIs (sometimes return a smartset,
sometimes return a set, so you never know what "x" is). It shouldn't happen
in a clean codebase.


Sorry. I misunderstood the whole paragraph. I'll move "toset" from
"abstractsmartset" to subclasses.


You should probably still have a toset() on abstractsmartset if you intend
for it to be a mandatory part of the interface (which I recommend, per
Liskov), but have it either do set(self) or (probably better) raise
NotImplementedError.


Unless I'm reading it wrong, he did add that.



Good advice by the way!


Thanks! Glad you like it. :)
Augie Fackler - June 3, 2017, 10:51 p.m.
> On Jun 3, 2017, at 6:48 PM, Martin von Zweigbergk <martinvonz@google.com> wrote:
> 
> 
> 
> On Jun 3, 2017 3:36 PM, "Augie Fackler" <raf@durin42.com> wrote:
> 
>> On Jun 3, 2017, at 4:58 PM, Jun Wu <quark@fb.com> wrote:
>> 
>> Excerpts from Jun Wu's message of 2017-06-03 13:54:18 -0700:
>>> Excerpts from Augie Fackler's message of 2017-06-03 16:25:52 -0400:
>>>> I like where you're headed with this,
> Where is Jun headed? It'd be nice to see the user of this new method and see the it helps there (in practice). The change is simple and makes sense, but it's still dead code at this point.

There are a reasonable number of places where we turn a smartset into a regular set, and right now there’s a fair amount of bonus overhead for some of the smartset instances because they already have a set that could be quickly copied by cpython all in native code, but instead we add a layer of iterator indirection which requires a lot of bouncing between native code and bytecode. For cases that can efficiently just dupe an existing set and immediately return it, this will be a nice little win.

>>>> but I think I'd rather have
>>>> classes with an efficient way to produce a set implement toset
>>>> themselves and leave the default implementation as set(self) so we can
>>>> avoid the spooky action at a distance implied by this hasattr().
>>> 
>>> The problem is the lack of ability to customize "set(smartset)" logic in the
>>> Python language. "set" will always treat "smartset" as an iterable, which
>>> makes the operation O(N log N) even if we have O(1) fast path.
>>> 
>>> Between syntactic simplicity and ability to have fast code path, I want to
>>> choose the latter.
>>> 
>>> For "hasattr", do you mean we might need to do hasattr(x, 'toset')? That
>>> looks like a sign of poorly designed APIs (sometimes return a smartset,
>>> sometimes return a set, so you never know what "x" is). It shouldn't happen
>>> in a clean codebase.
>> 
>> Sorry. I misunderstood the whole paragraph. I'll move "toset" from
>> "abstractsmartset" to subclasses.
> 
> You should probably still have a toset() on abstractsmartset if you intend for it to be a mandatory part of the interface (which I recommend, per Liskov), but have it either do set(self) or (probably better) raise NotImplementedError.
> 
> Unless I'm reading it wrong, he did add that.

I meant to keep it in a v2, without the hasattr() abstraction layer violation. :)

> 
> 
>> 
>> Good advice by the way!
> 
> Thanks! Glad you like it. :)
> 
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
> 
>
via Mercurial-devel - June 3, 2017, 11 p.m.
On Jun 3, 2017 3:51 PM, "Augie Fackler" <raf@durin42.com> wrote:


> On Jun 3, 2017, at 6:48 PM, Martin von Zweigbergk <martinvonz@google.com>
wrote:
>
>
>
> On Jun 3, 2017 3:36 PM, "Augie Fackler" <raf@durin42.com> wrote:
>
>> On Jun 3, 2017, at 4:58 PM, Jun Wu <quark@fb.com> wrote:
>>
>> Excerpts from Jun Wu's message of 2017-06-03 13:54:18 -0700:
>>> Excerpts from Augie Fackler's message of 2017-06-03 16:25:52 -0400:
>>>> I like where you're headed with this,
> Where is Jun headed? It'd be nice to see the user of this new method and
see the it helps there (in practice). The change is simple and makes sense,
but it's still dead code at this point.

There are a reasonable number of places where we turn a smartset into a
regular set, and right now there’s a fair amount of bonus overhead for some
of the smartset instances because they already have a set that could be
quickly copied by cpython all in native code, but instead we add a layer of
iterator indirection which requires a lot of bouncing between native code
and bytecode. For cases that can efficiently just dupe an existing set and
immediately return it, this will be a nice little win.


And I suppose in at least one of those, the smartset is a baseset? Sounds
good to me then.


>>>> but I think I'd rather have
>>>> classes with an efficient way to produce a set implement toset
>>>> themselves and leave the default implementation as set(self) so we can
>>>> avoid the spooky action at a distance implied by this hasattr().
>>>
>>> The problem is the lack of ability to customize "set(smartset)" logic
in the
>>> Python language. "set" will always treat "smartset" as an iterable,
which
>>> makes the operation O(N log N) even if we have O(1) fast path.
>>>
>>> Between syntactic simplicity and ability to have fast code path, I want
to
>>> choose the latter.
>>>
>>> For "hasattr", do you mean we might need to do hasattr(x, 'toset')? That
>>> looks like a sign of poorly designed APIs (sometimes return a smartset,
>>> sometimes return a set, so you never know what "x" is). It shouldn't
happen
>>> in a clean codebase.
>>
>> Sorry. I misunderstood the whole paragraph. I'll move "toset" from
>> "abstractsmartset" to subclasses.
>
> You should probably still have a toset() on abstractsmartset if you
intend for it to be a mandatory part of the interface (which I recommend,
per Liskov), but have it either do set(self) or (probably better) raise
NotImplementedError.
>
> Unless I'm reading it wrong, he did add that.

I meant to keep it in a v2, without the hasattr() abstraction layer
violation. :)


Maybe it's just poor diff context, but it looks like the function is on
abstractsmartset in v2 and without the hasattr hack.


>
>
>>
>> Good advice by the way!
>
> Thanks! Glad you like it. :)
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
>
Augie Fackler - June 3, 2017, 11:05 p.m.
> On Jun 3, 2017, at 7:00 PM, Martin von Zweigbergk <martinvonz@google.com> wrote:
> 
> I meant to keep it in a v2, without the hasattr() abstraction layer violation. :)
> 
> Maybe it's just poor diff context, but it looks like the function is on abstractsmartset in v2 and without the hasattr hack.

I haven’t looked at v2 yet. :)

Patch

diff --git a/mercurial/smartset.py b/mercurial/smartset.py
--- a/mercurial/smartset.py
+++ b/mercurial/smartset.py
@@ -156,4 +156,10 @@  class abstractsmartset(object):
         return filteredset(self, condition, condrepr)
 
+    def toset(self):
+        """Convert to unordered Python set"""
+        if not util.safehasattr(self, '_set'):
+            self._set = set(self)
+        return self._set
+
 class baseset(abstractsmartset):
     """Basic data structure that represents a revset and contains the basic
@@ -169,5 +175,6 @@  class baseset(abstractsmartset):
 
     Construct by a set:
-    >>> xs = baseset(set(x))
+    >>> xset = set(x)
+    >>> xs = baseset(xset)
     >>> ys = baseset(set(y))
     >>> [list(i) for i in [xs + ys, xs & ys, xs - ys]]
@@ -175,4 +182,6 @@  class baseset(abstractsmartset):
     >>> [type(i).__name__ for i in [xs + ys, xs & ys, xs - ys]]
     ['addset', 'baseset', 'baseset']
+    >>> xset is xs.toset()
+    True
 
     Construct by a list-like:
@@ -584,4 +593,10 @@  class addset(abstractsmartset):
     >>> [x for x in rs]
     [5, 4, 3, 2, 0]
+
+    convert to Python set:
+    >>> rs.toset() is rs.toset()
+    True
+    >>> sorted(rs.toset())
+    [0, 2, 3, 4, 5]
     """
     def __init__(self, revs1, revs2, ascending=None):