Patchwork [1,of,6,V2] hidden: add a function returning ancestors of revs within a domain

login
register
mail settings
Submitter Pierre-Yves David
Date May 23, 2017, 8:02 p.m.
Message ID <5f964af88a0fae242ce2.1495569749@nodosa.octopoid.net>
Download mbox | patch
Permalink /patch/20865/
State Changes Requested
Headers show

Comments

Pierre-Yves David - May 23, 2017, 8:02 p.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@octobus.net>
# Date 1495372906 -7200
#      Sun May 21 15:21:46 2017 +0200
# Node ID 5f964af88a0fae242ce24b0478c676d2056e0dc6
# Parent  8db2feb04cebc1878c6232dd2650f2c5468d350e
# EXP-Topic fast-compute-hidden
# Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
#              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r 5f964af88a0f
hidden: add a function returning ancestors of revs within a domain

See documentation for details. This will be used to improve the hidden
computation algorithm. See new changesets for usage.
via Mercurial-devel - May 24, 2017, 12:04 a.m.
On Tue, May 23, 2017 at 1:02 PM, Pierre-Yves David
<pierre-yves.david@ens-lyon.org> wrote:
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@octobus.net>
> # Date 1495372906 -7200
> #      Sun May 21 15:21:46 2017 +0200
> # Node ID 5f964af88a0fae242ce24b0478c676d2056e0dc6
> # Parent  8db2feb04cebc1878c6232dd2650f2c5468d350e
> # EXP-Topic fast-compute-hidden
> # Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
> #              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r 5f964af88a0f
> hidden: add a function returning ancestors of revs within a domain
>
> See documentation for details. This will be used to improve the hidden
> computation algorithm. See new changesets for usage.
>
> diff --git a/mercurial/repoview.py b/mercurial/repoview.py
> --- a/mercurial/repoview.py
> +++ b/mercurial/repoview.py
> @@ -92,6 +92,31 @@ def _getstatichidden(repo):
>                      heappush(heap, -parent)
>      return hidden
>
> +def _domainancestors(pfunc, revs, domain):
> +    """return ancestors of 'revs' within 'domain'
> +
> +    - pfunc(r): a funtion returning parent of 'r',
> +    - revs: iterable of revnum,
> +    - domain: consistent set of revnum.
> +
> +    The domain must be consistent: no connected subset are the ancestors of
> +    another connected subset. In other words, if the parents of a revision are
> +    not in the domains, no other ancestors of that revision.
> +
> +    (Ancestors are returned inclusively, 'revs' must be within the domain or
> +    direct children of it.)
> +    """
> +    stack = list(revs)
> +    ancs = set(stack)
> +    while stack:
> +        for p in pfunc(stack.pop()):
> +            nbanc = len(ancs)

s/nb/num/ maybe? I'd also prefer to have "ancestors" spelled out
(which seems ~3x more common so far in our code base according to
"grep -E 'anc(estor)?s = ' mercurial/*.py").

> +            if p != nullrev and p in domain:
> +                ancs.add(p)
> +            if nbanc != len(ancs): # avoid double membership testing
> +                stack.append(p)

The following seems easier to read than the above 5 lines. Does the
comment imply that it's noticeably slower?

if p != nullrev and p in domain and not p in ancs:
  ancs.add(p)
  stack.append(p)

> +    return ancs
> +
>  cacheversion = 1
>  cachefile = 'cache/hidden'
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Pierre-Yves David - May 24, 2017, 1:08 p.m.
On 05/24/2017 02:04 AM, Martin von Zweigbergk wrote:
> On Tue, May 23, 2017 at 1:02 PM, Pierre-Yves David
> <pierre-yves.david@ens-lyon.org> wrote:
>> # HG changeset patch
>> # User Pierre-Yves David <pierre-yves.david@octobus.net>
>> # Date 1495372906 -7200
>> #      Sun May 21 15:21:46 2017 +0200
>> # Node ID 5f964af88a0fae242ce24b0478c676d2056e0dc6
>> # Parent  8db2feb04cebc1878c6232dd2650f2c5468d350e
>> # EXP-Topic fast-compute-hidden
>> # Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
>> #              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r 5f964af88a0f
>> hidden: add a function returning ancestors of revs within a domain
>>
>> See documentation for details. This will be used to improve the hidden
>> computation algorithm. See new changesets for usage.
>>
>> diff --git a/mercurial/repoview.py b/mercurial/repoview.py
>> --- a/mercurial/repoview.py
>> +++ b/mercurial/repoview.py
>> @@ -92,6 +92,31 @@ def _getstatichidden(repo):
>>                      heappush(heap, -parent)
>>      return hidden
>>
>> +def _domainancestors(pfunc, revs, domain):
>> +    """return ancestors of 'revs' within 'domain'
>> +
>> +    - pfunc(r): a funtion returning parent of 'r',
>> +    - revs: iterable of revnum,
>> +    - domain: consistent set of revnum.
>> +
>> +    The domain must be consistent: no connected subset are the ancestors of
>> +    another connected subset. In other words, if the parents of a revision are
>> +    not in the domains, no other ancestors of that revision.
>> +
>> +    (Ancestors are returned inclusively, 'revs' must be within the domain or
>> +    direct children of it.)
>> +    """
>> +    stack = list(revs)
>> +    ancs = set(stack)
>> +    while stack:
>> +        for p in pfunc(stack.pop()):
>> +            nbanc = len(ancs)
>
> s/nb/num/ maybe?

I prefer "nb", but I can do both.

Do you want that renamed to "num" in the next version [Yn]

> I'd also prefer to have "ancestors" spelled out
> (which seems ~3x more common so far in our code base according to
> "grep -E 'anc(estor)?s = ' mercurial/*.py").

This does not lead to any lne overflow, I can will update that in the 
next version.

>> +            if p != nullrev and p in domain:
>> +                ancs.add(p)
>> +            if nbanc != len(ancs): # avoid double membership testing
>> +                stack.append(p)
>
> The following seems easier to read than the above 5 lines. Does the
> comment imply that it's noticeably slower?
>
> if p != nullrev and p in domain and not p in ancs:
>   ancs.add(p)
>   stack.append(p)

The _getstatichidden code is using this hack[1]. So I assumed it was 
been useful in the past and kept that around. Given the large drop in 
complexity, this is likely less important. However given how critical 
the performance of this patch are, I would rather cargo cult this.

[1] 
https://www.mercurial-scm.org/repo/hg/file/bdc4861ffe59/mercurial/repoview.py#l68

Cheers,
via Mercurial-devel - May 24, 2017, 5:23 p.m.
On Wed, May 24, 2017 at 6:08 AM, Pierre-Yves David
<pierre-yves.david@ens-lyon.org> wrote:
> On 05/24/2017 02:04 AM, Martin von Zweigbergk wrote:
>>
>> On Tue, May 23, 2017 at 1:02 PM, Pierre-Yves David
>> <pierre-yves.david@ens-lyon.org> wrote:
>>>
>>> # HG changeset patch
>>> # User Pierre-Yves David <pierre-yves.david@octobus.net>
>>> # Date 1495372906 -7200
>>> #      Sun May 21 15:21:46 2017 +0200
>>> # Node ID 5f964af88a0fae242ce24b0478c676d2056e0dc6
>>> # Parent  8db2feb04cebc1878c6232dd2650f2c5468d350e
>>> # EXP-Topic fast-compute-hidden
>>> # Available At
>>> https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
>>> #              hg pull
>>> https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r 5f964af88a0f
>>> hidden: add a function returning ancestors of revs within a domain
>>>
>>> See documentation for details. This will be used to improve the hidden
>>> computation algorithm. See new changesets for usage.
>>>
>>> diff --git a/mercurial/repoview.py b/mercurial/repoview.py
>>> --- a/mercurial/repoview.py
>>> +++ b/mercurial/repoview.py
>>> @@ -92,6 +92,31 @@ def _getstatichidden(repo):
>>>                      heappush(heap, -parent)
>>>      return hidden
>>>
>>> +def _domainancestors(pfunc, revs, domain):
>>> +    """return ancestors of 'revs' within 'domain'
>>> +
>>> +    - pfunc(r): a funtion returning parent of 'r',
>>> +    - revs: iterable of revnum,
>>> +    - domain: consistent set of revnum.
>>> +
>>> +    The domain must be consistent: no connected subset are the ancestors
>>> of
>>> +    another connected subset. In other words, if the parents of a
>>> revision are
>>> +    not in the domains, no other ancestors of that revision.
>>> +
>>> +    (Ancestors are returned inclusively, 'revs' must be within the
>>> domain or
>>> +    direct children of it.)
>>> +    """
>>> +    stack = list(revs)
>>> +    ancs = set(stack)
>>> +    while stack:
>>> +        for p in pfunc(stack.pop()):
>>> +            nbanc = len(ancs)
>>
>>
>> s/nb/num/ maybe?
>
>
> I prefer "nb", but I can do both.
>
> Do you want that renamed to "num" in the next version [Yn]

Yes. Seems unlikely I would ask otherwise, no? :-P

I did do a quick search for both forms to make sure I was suggesting
something that seems common. Feel free to do that yourself and let me
know if you find that nb is the more common prefix.

>
>> I'd also prefer to have "ancestors" spelled out
>> (which seems ~3x more common so far in our code base according to
>> "grep -E 'anc(estor)?s = ' mercurial/*.py").
>
>
> This does not lead to any lne overflow, I can will update that in the next
> version.
>
>>> +            if p != nullrev and p in domain:
>>> +                ancs.add(p)
>>> +            if nbanc != len(ancs): # avoid double membership testing
>>> +                stack.append(p)
>>
>>
>> The following seems easier to read than the above 5 lines. Does the
>> comment imply that it's noticeably slower?
>>
>> if p != nullrev and p in domain and not p in ancs:
>>   ancs.add(p)
>>   stack.append(p)
>
>
> The _getstatichidden code is using this hack[1]. So I assumed it was been
> useful in the past and kept that around. Given the large drop in complexity,
> this is likely less important. However given how critical the performance of
> this patch are, I would rather cargo cult this.

Fair enough, but note the code you're cargo-culting (from yourself,
btw) seems to be slower than what I'm proposing.

$ python -m timeit -s 'a=[i for i in range(1000)]' 'b=set()
for x in a:
  pre = len(b)
  b.add(x)
  if len(b) > pre:
    pass
'
10000 loops, best of 3: 149 usec per loop

$ python -m timeit -s 'a=[i for i in range(1000)]' 'b=set()
for x in a:
  if x not in b:
    b.add(x)
'
10000 loops, best of 3: 96.6 usec per loop

In case you're thinking that I'm trying to trick you by not including
data for when there are lots of merges so the set would be updated
less frequently, here it is:

$ python -m timeit -s 'a=[i/2 for i in range(1000)]' 'b=set()
for x in a:
  pre = len(b)
  b.add(x)
  if len(b) > pre:
    pass
'
10000 loops, best of 3: 163 usec per loop

$ python -m timeit -s 'a=[i/2 for i in range(1000)]' 'b=set()
for x in a:
  if x not in b:
    b.add(x)
'
10000 loops, best of 3: 63.5 usec per loop

Or am I missing something?

>
> [1]
> https://www.mercurial-scm.org/repo/hg/file/bdc4861ffe59/mercurial/repoview.py#l68
>
> Cheers,
>
> --
> Pierre-Yves David
Pierre-Yves David - May 24, 2017, 6:11 p.m.
On 05/24/2017 07:23 PM, Martin von Zweigbergk wrote:
> On Wed, May 24, 2017 at 6:08 AM, Pierre-Yves David
> <pierre-yves.david@ens-lyon.org> wrote:
>> On 05/24/2017 02:04 AM, Martin von Zweigbergk wrote:
>>>
>>> On Tue, May 23, 2017 at 1:02 PM, Pierre-Yves David
>>> <pierre-yves.david@ens-lyon.org> wrote:
>>>>
>>>> # HG changeset patch
>>>> # User Pierre-Yves David <pierre-yves.david@octobus.net>
>>>> # Date 1495372906 -7200
>>>> #      Sun May 21 15:21:46 2017 +0200
>>>> # Node ID 5f964af88a0fae242ce24b0478c676d2056e0dc6
>>>> # Parent  8db2feb04cebc1878c6232dd2650f2c5468d350e
>>>> # EXP-Topic fast-compute-hidden
>>>> # Available At
>>>> https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
>>>> #              hg pull
>>>> https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r 5f964af88a0f
>>>> hidden: add a function returning ancestors of revs within a domain
>>>>
>>>> See documentation for details. This will be used to improve the hidden
>>>> computation algorithm. See new changesets for usage.
>>>>
>>>> diff --git a/mercurial/repoview.py b/mercurial/repoview.py
>>>> --- a/mercurial/repoview.py
>>>> +++ b/mercurial/repoview.py
>>>> @@ -92,6 +92,31 @@ def _getstatichidden(repo):
>>>>                      heappush(heap, -parent)
>>>>      return hidden
>>>>
>>>> +def _domainancestors(pfunc, revs, domain):
>>>> +    """return ancestors of 'revs' within 'domain'
>>>> +
>>>> +    - pfunc(r): a funtion returning parent of 'r',
>>>> +    - revs: iterable of revnum,
>>>> +    - domain: consistent set of revnum.
>>>> +
>>>> +    The domain must be consistent: no connected subset are the ancestors
>>>> of
>>>> +    another connected subset. In other words, if the parents of a
>>>> revision are
>>>> +    not in the domains, no other ancestors of that revision.
>>>> +
>>>> +    (Ancestors are returned inclusively, 'revs' must be within the
>>>> domain or
>>>> +    direct children of it.)
>>>> +    """
>>>> +    stack = list(revs)
>>>> +    ancs = set(stack)
>>>> +    while stack:
>>>> +        for p in pfunc(stack.pop()):
>>>> +            nbanc = len(ancs)
>>>
>>>
>>> s/nb/num/ maybe?
>>
>>
>> I prefer "nb", but I can do both.
>>
>> Do you want that renamed to "num" in the next version [Yn]
>
> Yes. Seems unlikely I would ask otherwise, no? :-P
>
> I did do a quick search for both forms to make sure I was suggesting
> something that seems common. Feel free to do that yourself and let me
> know if you find that nb is the more common prefix.

I'll rename it. The question not worth the CPU-time spent greping that :-)

>>> I'd also prefer to have "ancestors" spelled out
>>> (which seems ~3x more common so far in our code base according to
>>> "grep -E 'anc(estor)?s = ' mercurial/*.py").
>>
>>
>> This does not lead to any lne overflow, I can will update that in the next
>> version.
>>
>>>> +            if p != nullrev and p in domain:
>>>> +                ancs.add(p)
>>>> +            if nbanc != len(ancs): # avoid double membership testing
>>>> +                stack.append(p)
>>>
>>>
>>> The following seems easier to read than the above 5 lines. Does the
>>> comment imply that it's noticeably slower?
>>>
>>> if p != nullrev and p in domain and not p in ancs:
>>>   ancs.add(p)
>>>   stack.append(p)
>>
>>
>> The _getstatichidden code is using this hack[1]. So I assumed it was been
>> useful in the past and kept that around. Given the large drop in complexity,
>> this is likely less important. However given how critical the performance of
>> this patch are, I would rather cargo cult this.
>
> Fair enough, but note the code you're cargo-culting (from yourself,
> btw) seems to be slower than what I'm proposing.

Thanks for looking into that. I'll kill the complicated form.

Cheers,

Patch

diff --git a/mercurial/repoview.py b/mercurial/repoview.py
--- a/mercurial/repoview.py
+++ b/mercurial/repoview.py
@@ -92,6 +92,31 @@  def _getstatichidden(repo):
                     heappush(heap, -parent)
     return hidden
 
+def _domainancestors(pfunc, revs, domain):
+    """return ancestors of 'revs' within 'domain'
+
+    - pfunc(r): a funtion returning parent of 'r',
+    - revs: iterable of revnum,
+    - domain: consistent set of revnum.
+
+    The domain must be consistent: no connected subset are the ancestors of
+    another connected subset. In other words, if the parents of a revision are
+    not in the domains, no other ancestors of that revision.
+
+    (Ancestors are returned inclusively, 'revs' must be within the domain or
+    direct children of it.)
+    """
+    stack = list(revs)
+    ancs = set(stack)
+    while stack:
+        for p in pfunc(stack.pop()):
+            nbanc = len(ancs)
+            if p != nullrev and p in domain:
+                ancs.add(p)
+            if nbanc != len(ancs): # avoid double membership testing
+                stack.append(p)
+    return ancs
+
 cacheversion = 1
 cachefile = 'cache/hidden'