Patchwork [3,of,4] revset: add helper to build balanced addsets from chained 'or' operations

login
register
mail settings
Submitter Yuya Nishihara
Date May 26, 2015, 3 p.m.
Message ID <f8cd0e0ceea631cb6d22.1432652421@mimosa>
Download mbox | patch
Permalink /patch/9278/
State Superseded
Commit 036c26b08b71dbeb9261772ae788f0936ead8e46
Headers show

Comments

Yuya Nishihara - May 26, 2015, 3 p.m.
# HG changeset patch
# User Yuya Nishihara <yuya@tcha.org>
# Date 1432444252 -32400
#      Sun May 24 14:10:52 2015 +0900
# Node ID f8cd0e0ceea631cb6d22b324bfeeb7fecee426fb
# Parent  70591283d99dcc442f0e7bdb9afa603eb18bfac1
revset: add helper to build balanced addsets from chained 'or' operations

This function will be used to reduce the stack depth from O(n) to O(log(n)).
It is public function because we'll use it later to fix stack overflow at
scmutil.revrange().
Pierre-Yves David - May 26, 2015, 5:59 p.m.
On 05/26/2015 08:00 AM, Yuya Nishihara wrote:
> # HG changeset patch
> # User Yuya Nishihara <yuya@tcha.org>
> # Date 1432444252 -32400
> #      Sun May 24 14:10:52 2015 +0900
> # Node ID f8cd0e0ceea631cb6d22b324bfeeb7fecee426fb
> # Parent  70591283d99dcc442f0e7bdb9afa603eb18bfac1
> revset: add helper to build balanced addsets from chained 'or' operations
>
> This function will be used to reduce the stack depth from O(n) to O(log(n)).
> It is public function because we'll use it later to fix stack overflow at
> scmutil.revrange().
>
> diff --git a/mercurial/revset.py b/mercurial/revset.py
> --- a/mercurial/revset.py
> +++ b/mercurial/revset.py
> @@ -2940,6 +2940,17 @@ class filteredset(abstractsmartset):
>       def __repr__(self):
>           return '<%s %r>' % (type(self).__name__, self._subset)
>
> +def combinesets(subsets):
> +    """Create balanced tree of addsets representing union of given sets"""
> +    if not subsets:
> +        return baseset()
> +    if len(subsets) == 1:
> +        return subsets[0]
> +    p = len(subsets) // 2
> +    xs = combinesets(subsets[:p])
> +    ys = combinesets(subsets[p:])
> +    return addset(xs, ys)
> +

My idea for this was to include that directly into the addset 
constructor. The main motivation for it is to keep a single interface 
for adding stuff.

class addset(…):

     def __init__(self, *subsets, ascending=None): #has to be **kwargs
         assert 1< len(subsets)
         if 2 < len(subsets):
             p = len(subsets) // 2
             revs1 = addset(*subsets[:p])
             revs2 = subsets[p:]
             if len(revs2) == 1:
                 revs2 = revs2[0]
             else:
                 revs2 = addset(revs2)
         self._r1 = revs1
         self._r2 = revs2
Yuya Nishihara - May 26, 2015, 11 p.m.
On Tue, 26 May 2015 10:59:27 -0700, Pierre-Yves David wrote:
> On 05/26/2015 08:00 AM, Yuya Nishihara wrote:
> > # HG changeset patch
> > # User Yuya Nishihara <yuya@tcha.org>
> > # Date 1432444252 -32400
> > #      Sun May 24 14:10:52 2015 +0900
> > # Node ID f8cd0e0ceea631cb6d22b324bfeeb7fecee426fb
> > # Parent  70591283d99dcc442f0e7bdb9afa603eb18bfac1
> > revset: add helper to build balanced addsets from chained 'or' operations
> >
> > This function will be used to reduce the stack depth from O(n) to O(log(n)).
> > It is public function because we'll use it later to fix stack overflow at
> > scmutil.revrange().
> >
> > diff --git a/mercurial/revset.py b/mercurial/revset.py
> > --- a/mercurial/revset.py
> > +++ b/mercurial/revset.py
> > @@ -2940,6 +2940,17 @@ class filteredset(abstractsmartset):
> >       def __repr__(self):
> >           return '<%s %r>' % (type(self).__name__, self._subset)
> >
> > +def combinesets(subsets):
> > +    """Create balanced tree of addsets representing union of given sets"""
> > +    if not subsets:
> > +        return baseset()
> > +    if len(subsets) == 1:
> > +        return subsets[0]
> > +    p = len(subsets) // 2
> > +    xs = combinesets(subsets[:p])
> > +    ys = combinesets(subsets[p:])
> > +    return addset(xs, ys)
> > +
> 
> My idea for this was to include that directly into the addset 
> constructor. The main motivation for it is to keep a single interface 
> for adding stuff.
> 
> class addset(…):
> 
>      def __init__(self, *subsets, ascending=None): #has to be **kwargs
>          assert 1< len(subsets)
>          if 2 < len(subsets):
>              p = len(subsets) // 2
>              revs1 = addset(*subsets[:p])
>              revs2 = subsets[p:]
>              if len(revs2) == 1:
>                  revs2 = revs2[0]
>              else:
>                  revs2 = addset(revs2)
>          self._r1 = revs1
>          self._r2 = revs2

It could, but "combinesets([]) -> baseset()" is somewhat useful in revrange().
Pierre-Yves David - May 26, 2015, 11:08 p.m.
On 05/26/2015 04:00 PM, Yuya Nishihara wrote:
> On Tue, 26 May 2015 10:59:27 -0700, Pierre-Yves David wrote:
>> On 05/26/2015 08:00 AM, Yuya Nishihara wrote:
>>> # HG changeset patch
>>> # User Yuya Nishihara <yuya@tcha.org>
>>> # Date 1432444252 -32400
>>> #      Sun May 24 14:10:52 2015 +0900
>>> # Node ID f8cd0e0ceea631cb6d22b324bfeeb7fecee426fb
>>> # Parent  70591283d99dcc442f0e7bdb9afa603eb18bfac1
>>> revset: add helper to build balanced addsets from chained 'or' operations
>>>
>>> This function will be used to reduce the stack depth from O(n) to O(log(n)).
>>> It is public function because we'll use it later to fix stack overflow at
>>> scmutil.revrange().
>>>
>>> diff --git a/mercurial/revset.py b/mercurial/revset.py
>>> --- a/mercurial/revset.py
>>> +++ b/mercurial/revset.py
>>> @@ -2940,6 +2940,17 @@ class filteredset(abstractsmartset):
>>>        def __repr__(self):
>>>            return '<%s %r>' % (type(self).__name__, self._subset)
>>>
>>> +def combinesets(subsets):
>>> +    """Create balanced tree of addsets representing union of given sets"""
>>> +    if not subsets:
>>> +        return baseset()
>>> +    if len(subsets) == 1:
>>> +        return subsets[0]
>>> +    p = len(subsets) // 2
>>> +    xs = combinesets(subsets[:p])
>>> +    ys = combinesets(subsets[p:])
>>> +    return addset(xs, ys)
>>> +
>>
>> My idea for this was to include that directly into the addset
>> constructor. The main motivation for it is to keep a single interface
>> for adding stuff.
>>
>> class addset(…):
>>
>>       def __init__(self, *subsets, ascending=None): #has to be **kwargs
>>           assert 1< len(subsets)
>>           if 2 < len(subsets):
>>               p = len(subsets) // 2
>>               revs1 = addset(*subsets[:p])
>>               revs2 = subsets[p:]
>>               if len(revs2) == 1:
>>                   revs2 = revs2[0]
>>               else:
>>                   revs2 = addset(revs2)
>>           self._r1 = revs1
>>           self._r2 = revs2
>
> It could, but "combinesets([]) -> baseset()" is somewhat useful in revrange().

I think we will need a generic "promote this object to a baseset in 
place because we have soaked all the lazyness out of it" anyway. (yes 
this wil be black magic…)

We can also easily turn addset into an factory function if we have 
smarter code.

My main objective here is to reduce the official API surface.

If is likely we'lll a C version of addset that does not needs the 
balanced tree approach, we may even get a Python version for it. Keeping 
all this "balanced tree" magic behuind the simple "addset(…)" API seems 
simpler here.

> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> https://selenic.com/mailman/listinfo/mercurial-devel
>
Yuya Nishihara - May 27, 2015, 11:46 a.m.
On Tue, 26 May 2015 16:08:28 -0700, Pierre-Yves David wrote:
> On 05/26/2015 04:00 PM, Yuya Nishihara wrote:
> > On Tue, 26 May 2015 10:59:27 -0700, Pierre-Yves David wrote:
> >> On 05/26/2015 08:00 AM, Yuya Nishihara wrote:
> >>> # HG changeset patch
> >>> # User Yuya Nishihara <yuya@tcha.org>
> >>> # Date 1432444252 -32400
> >>> #      Sun May 24 14:10:52 2015 +0900
> >>> # Node ID f8cd0e0ceea631cb6d22b324bfeeb7fecee426fb
> >>> # Parent  70591283d99dcc442f0e7bdb9afa603eb18bfac1
> >>> revset: add helper to build balanced addsets from chained 'or' operations
> >>>
> >>> This function will be used to reduce the stack depth from O(n) to O(log(n)).
> >>> It is public function because we'll use it later to fix stack overflow at
> >>> scmutil.revrange().
> >>>
> >>> diff --git a/mercurial/revset.py b/mercurial/revset.py
> >>> --- a/mercurial/revset.py
> >>> +++ b/mercurial/revset.py
> >>> @@ -2940,6 +2940,17 @@ class filteredset(abstractsmartset):
> >>>        def __repr__(self):
> >>>            return '<%s %r>' % (type(self).__name__, self._subset)
> >>>
> >>> +def combinesets(subsets):
> >>> +    """Create balanced tree of addsets representing union of given sets"""
> >>> +    if not subsets:
> >>> +        return baseset()
> >>> +    if len(subsets) == 1:
> >>> +        return subsets[0]
> >>> +    p = len(subsets) // 2
> >>> +    xs = combinesets(subsets[:p])
> >>> +    ys = combinesets(subsets[p:])
> >>> +    return addset(xs, ys)
> >>> +
> >>
> >> My idea for this was to include that directly into the addset
> >> constructor. The main motivation for it is to keep a single interface
> >> for adding stuff.
> >>
> >> class addset(…):
> >>
> >>       def __init__(self, *subsets, ascending=None): #has to be **kwargs
> >>           assert 1< len(subsets)
> >>           if 2 < len(subsets):
> >>               p = len(subsets) // 2
> >>               revs1 = addset(*subsets[:p])
> >>               revs2 = subsets[p:]
> >>               if len(revs2) == 1:
> >>                   revs2 = revs2[0]
> >>               else:
> >>                   revs2 = addset(revs2)
> >>           self._r1 = revs1
> >>           self._r2 = revs2
> >
> > It could, but "combinesets([]) -> baseset()" is somewhat useful in revrange().
> 
> I think we will need a generic "promote this object to a baseset in 
> place because we have soaked all the lazyness out of it" anyway. (yes 
> this wil be black magic…)
> 
> We can also easily turn addset into an factory function if we have 
> smarter code.
> 
> My main objective here is to reduce the official API surface.

Can we make it a private API and scmutil.revrange() use it _for now_?

At some point, we'll be able to drop the old-style parser from revrange().
After that, revrange() can be rewritten as follows:

  scmutil.revrange(repo, revs)
    revset.matchany(ui, specs=revs, repo)
      trees = [parse(s) for s in specs]
      optimize(('or',) + trees)  # optimize everything at once for better result
      ...

Here _combinesets() will no longer be used by the other modules.

Regards,

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -2940,6 +2940,17 @@  class filteredset(abstractsmartset):
     def __repr__(self):
         return '<%s %r>' % (type(self).__name__, self._subset)
 
+def combinesets(subsets):
+    """Create balanced tree of addsets representing union of given sets"""
+    if not subsets:
+        return baseset()
+    if len(subsets) == 1:
+        return subsets[0]
+    p = len(subsets) // 2
+    xs = combinesets(subsets[:p])
+    ys = combinesets(subsets[p:])
+    return addset(xs, ys)
+
 def _iterordered(ascending, iter1, iter2):
     """produce an ordered iteration from two iterators with the same order