Patchwork [2,of,2] revlog: do inclusive descendant testing (API)

login
register
mail settings
Submitter Paul Morelle
Date June 21, 2018, 2:24 p.m.
Message ID <59fea52e54e014722486.1529591069@belenos.localdomain>
Download mbox | patch
Permalink /patch/32363/
State New
Headers show

Comments

Paul Morelle - June 21, 2018, 2:24 p.m.
# HG changeset patch
# User Boris Feld <boris.feld@octobus.net>
# Date 1529585081 -3600
#      Thu Jun 21 13:44:41 2018 +0100
# Node ID 59fea52e54e014722486f7c049e192fa505032d8
# Parent  8d20b1b4b6a0297e7f9640d285b15a5d6647369e
# EXP-Topic descendant
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 59fea52e54e0
revlog: do inclusive descendant testing (API)

In many other places, a revision is considered a descendant of itself.
We update the behavior of `revlog.descendant()` to match this.

No test breaks, so it seems safe to do so.
via Mercurial-devel - June 21, 2018, 4:30 p.m.
On Thu, Jun 21, 2018 at 7:24 AM Paul Morelle <paul.morelle@octobus.net>
wrote:

> # HG changeset patch
> # User Boris Feld <boris.feld@octobus.net>
> # Date 1529585081 -3600
> #      Thu Jun 21 13:44:41 2018 +0100
> # Node ID 59fea52e54e014722486f7c049e192fa505032d8
> # Parent  8d20b1b4b6a0297e7f9640d285b15a5d6647369e
> # EXP-Topic descendant
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
> 59fea52e54e0
> revlog: do inclusive descendant testing (API)
>
> In many other places, a revision is considered a descendant of itself.
> We update the behavior of `revlog.descendant()` to match this.
>
> No test breaks, so it seems safe to do so.
>
> diff -r 8d20b1b4b6a0 -r 59fea52e54e0 mercurial/revlog.py
> --- a/mercurial/revlog.py       Thu Jun 21 13:32:07 2018 +0100
> +++ b/mercurial/revlog.py       Thu Jun 21 13:44:41 2018 +0100
> @@ -1378,7 +1378,7 @@
>      def descendant(self, start, end):
>          if start == nullrev:
>              return True
> -        return start in self.ancestors([end])
> +        return start in self.ancestors([end], inclusive=True)
>

Is this now equivalent to self.isancestor(start, end)? That method relies
on commonancestorsheads instead of lazyancestors. What are the performance
trade-offs? Equivalent both when there are many ancestors and when there
are many descendants?


>      def commonancestorsheads(self, a, b):
>          """calculate all the heads of the common ancestors of nodes a and
> b"""
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
Paul Morelle - June 25, 2018, 4:54 p.m.
On 21/06/18 18:30, Martin von Zweigbergk wrote:
> On Thu, Jun 21, 2018 at 7:24 AM Paul Morelle <paul.morelle@octobus.net
> <mailto:paul.morelle@octobus.net>> wrote:
>
>     # HG changeset patch
>     # User Boris Feld <boris.feld@octobus.net
>     <mailto:boris.feld@octobus.net>>
>     # Date 1529585081 -3600
>     #      Thu Jun 21 13:44:41 2018 +0100
>     # Node ID 59fea52e54e014722486f7c049e192fa505032d8
>     # Parent  8d20b1b4b6a0297e7f9640d285b15a5d6647369e
>     # EXP-Topic descendant
>     # Available At https://bitbucket.org/octobus/mercurial-devel/
>     #              hg pull
>     https://bitbucket.org/octobus/mercurial-devel/ -r 59fea52e54e0
>     revlog: do inclusive descendant testing (API)
>
>     In many other places, a revision is considered a descendant of itself.
>     We update the behavior of `revlog.descendant()` to match this.
>
>     No test breaks, so it seems safe to do so.
>
>     diff -r 8d20b1b4b6a0 -r 59fea52e54e0 mercurial/revlog.py
>     --- a/mercurial/revlog.py       Thu Jun 21 13:32:07 2018 +0100
>     +++ b/mercurial/revlog.py       Thu Jun 21 13:44:41 2018 +0100
>     @@ -1378,7 +1378,7 @@
>          def descendant(self, start, end):
>              if start == nullrev:
>                  return True
>     -        return start in self.ancestors([end])
>     +        return start in self.ancestors([end], inclusive=True)
>
>
> Is this now equivalent to self.isancestor(start, end)? That method
> relies on commonancestorsheads instead of lazyancestors. What are the
> performance trade-offs? Equivalent both when there are many ancestors
> and when there are many descendants?
Hello Martin,

Interestingly, it turns out that we have the following flock of functions:

  * ancestors: commonancestorsheads(parent_func, *revs)
      o uses revnum
      o any number of arguments
      o written in Python
  * cext/revlog.c: revlog.index.commonancestorsheads(*revs)
      o uses revnum
      o any number of arguments
      o written in C
  * revlog: revlog.commonancestorsheads(node-a, node-b)
      o uses nodes
      o takes exactly two nodes
      o Calls either self.index.c…a…heads  or ancestors.c…a…heads
  * revlog: revlog.isancestor(anc, desc)
      o uses nodes
      o calls revlog.commonancestorsheads
  * revlog: revlog.descendant(rev-a, rev-b)
      o uses revs
      o has it own very slow code
  * revlog: revlog.descendant(rev-a, rev-b)
      o uses revs
      o has it own very slow code
      o non-inclusive
  * context: context.descendant(other)
      o uses contexts
      o calls revlog.descendant
      o non-inclusive

At the algorithm level, `anc in ancestors(desc)` will be faster when anc
is not an ancestor of desc (or they are many gca), since it will finish
sooner. However given `commonancestorheads` benefits from a C
implementation, it is currently the fastest option.

In short terms, I think the following actions would make sense:

 1. Extract a lower level `revlog._commonancestorheads(*revs)` from
    `revlog.commonancestorsheads`
 2. Use it in `revlog.descendant`
 3. Make `revlog.isancestor` use `revlog.descendant`

Does this seems sensible to you?

--
Paul Morelle
via Mercurial-devel - June 27, 2018, 5:42 p.m.
On Mon, Jun 25, 2018 at 9:54 AM Paul Morelle <paul.morelle@octobus.net>
wrote:

> On 21/06/18 18:30, Martin von Zweigbergk wrote:
>
> On Thu, Jun 21, 2018 at 7:24 AM Paul Morelle <paul.morelle@octobus.net>
> wrote:
>
>> # HG changeset patch
>> # User Boris Feld <boris.feld@octobus.net>
>> # Date 1529585081 -3600
>> #      Thu Jun 21 13:44:41 2018 +0100
>> # Node ID 59fea52e54e014722486f7c049e192fa505032d8
>> # Parent  8d20b1b4b6a0297e7f9640d285b15a5d6647369e
>> # EXP-Topic descendant
>> # Available At https://bitbucket.org/octobus/mercurial-devel/
>> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
>> 59fea52e54e0
>> revlog: do inclusive descendant testing (API)
>>
>> In many other places, a revision is considered a descendant of itself.
>> We update the behavior of `revlog.descendant()` to match this.
>>
>> No test breaks, so it seems safe to do so.
>>
>> diff -r 8d20b1b4b6a0 -r 59fea52e54e0 mercurial/revlog.py
>> --- a/mercurial/revlog.py       Thu Jun 21 13:32:07 2018 +0100
>> +++ b/mercurial/revlog.py       Thu Jun 21 13:44:41 2018 +0100
>> @@ -1378,7 +1378,7 @@
>>      def descendant(self, start, end):
>>          if start == nullrev:
>>              return True
>> -        return start in self.ancestors([end])
>> +        return start in self.ancestors([end], inclusive=True)
>>
>
> Is this now equivalent to self.isancestor(start, end)? That method relies
> on commonancestorsheads instead of lazyancestors. What are the performance
> trade-offs? Equivalent both when there are many ancestors and when there
> are many descendants?
>
> Hello Martin,
>
> Interestingly, it turns out that we have the following flock of functions:
>
>    - ancestors: commonancestorsheads(parent_func, *revs)
>       - uses revnum
>       - any number of arguments
>       - written in Python
>    - cext/revlog.c: revlog.index.commonancestorsheads(*revs)
>       - uses revnum
>       - any number of arguments
>       - written in C
>    - revlog: revlog.commonancestorsheads(node-a, node-b)
>       - uses nodes
>       - takes exactly two nodes
>       - Calls either self.index.c…a…heads  or ancestors.c…a…heads
>    - revlog: revlog.isancestor(anc, desc)
>       - uses nodes
>       - calls revlog.commonancestorsheads
>    - revlog: revlog.descendant(rev-a, rev-b)
>       - uses revs
>       - has it own very slow code
>    - revlog: revlog.descendant(rev-a, rev-b)
>       - uses revs
>       - has it own very slow code
>       - non-inclusive
>    - context: context.descendant(other)
>       - uses contexts
>       - calls revlog.descendant
>       - non-inclusive
>
> At the algorithm level, `anc in ancestors(desc)` will be faster when anc
> is not an ancestor of desc (or they are many gca), since it will finish
> sooner. However given `commonancestorheads` benefits from a C
> implementation, it is currently the fastest option.
>
> In short terms, I think the following actions would make sense:
>
>    1. Extract a lower level `revlog._commonancestorheads(*revs)` from
>    `revlog.commonancestorsheads`
>
> What would the new function do? commonancestorsheads is already pretty
short. Do you mean to just make it work on revnums instead of nodeids? Or
do you mean for making it optionally inclusive?

>
>    1. Use it in `revlog.descendant`
>    2. Make `revlog.isancestor` use `revlog.descendant`
>
> Why is that better than making descendant call isancestor?

Does this seems sensible to you?
>
> --
> Paul Morelle
>
Paul Morelle - June 29, 2018, 7:42 p.m.
On 27/06/18 19:42, Martin von Zweigbergk wrote:
>
>
> On Mon, Jun 25, 2018 at 9:54 AM Paul Morelle <paul.morelle@octobus.net
> <mailto:paul.morelle@octobus.net>> wrote:
>
>     On 21/06/18 18:30, Martin von Zweigbergk wrote:
>>     On Thu, Jun 21, 2018 at 7:24 AM Paul Morelle
>>     <paul.morelle@octobus.net <mailto:paul.morelle@octobus.net>> wrote:
>>
>>         # HG changeset patch
>>         # User Boris Feld <boris.feld@octobus.net
>>         <mailto:boris.feld@octobus.net>>
>>         # Date 1529585081 -3600
>>         #      Thu Jun 21 13:44:41 2018 +0100
>>         # Node ID 59fea52e54e014722486f7c049e192fa505032d8
>>         # Parent  8d20b1b4b6a0297e7f9640d285b15a5d6647369e
>>         # EXP-Topic descendant
>>         # Available At https://bitbucket.org/octobus/mercurial-devel/
>>         #              hg pull
>>         https://bitbucket.org/octobus/mercurial-devel/ -r 59fea52e54e0
>>         revlog: do inclusive descendant testing (API)
>>
>>         In many other places, a revision is considered a descendant
>>         of itself.
>>         We update the behavior of `revlog.descendant()` to match this.
>>
>>         No test breaks, so it seems safe to do so.
>>
>>         diff -r 8d20b1b4b6a0 -r 59fea52e54e0 mercurial/revlog.py
>>         --- a/mercurial/revlog.py       Thu Jun 21 13:32:07 2018 +0100
>>         +++ b/mercurial/revlog.py       Thu Jun 21 13:44:41 2018 +0100
>>         @@ -1378,7 +1378,7 @@
>>              def descendant(self, start, end):
>>                  if start == nullrev:
>>                      return True
>>         -        return start in self.ancestors([end])
>>         +        return start in self.ancestors([end], inclusive=True)
>>
>>
>>     Is this now equivalent to self.isancestor(start, end)? That
>>     method relies on commonancestorsheads instead of lazyancestors.
>>     What are the performance trade-offs? Equivalent both when there
>>     are many ancestors and when there are many descendants?
>     Hello Martin,
>
>     Interestingly, it turns out that we have the following flock of
>     functions:
>
>       * ancestors: commonancestorsheads(parent_func, *revs)
>           o uses revnum
>           o any number of arguments
>           o written in Python
>       * cext/revlog.c: revlog.index.commonancestorsheads(*revs)
>           o uses revnum
>           o any number of arguments
>           o written in C
>       * revlog: revlog.commonancestorsheads(node-a, node-b)
>           o uses nodes
>           o takes exactly two nodes
>           o Calls either self.index.c…a…heads  or ancestors.c…a…heads
>       * revlog: revlog.isancestor(anc, desc)
>           o uses nodes
>           o calls revlog.commonancestorsheads
>       * revlog: revlog.descendant(rev-a, rev-b)
>           o uses revs
>           o has it own very slow code
>       * revlog: revlog.descendant(rev-a, rev-b)
>           o uses revs
>           o has it own very slow code
>           o non-inclusive
>       * context: context.descendant(other)
>           o uses contexts
>           o calls revlog.descendant
>           o non-inclusive
>
>     At the algorithm level, `anc in ancestors(desc)` will be faster
>     when anc is not an ancestor of desc (or they are many gca), since
>     it will finish sooner. However given `commonancestorheads`
>     benefits from a C implementation, it is currently the fastest option.
>
>     In short terms, I think the following actions would make sense:
>
>      1. Extract a lower level `revlog._commonancestorheads(*revs)`
>         from `revlog.commonancestorsheads`
>
> What would the new function do? commonancestorsheads is already pretty
> short. Do you mean to just make it work on revnums instead of nodeids?
> Or do you mean for making it optionally inclusive?
The new function would operate directly on revision numbers instead of
converting from/to nodes. The code of revlog.commonancestorsheads would
then look like this:

    def commonancestorsheads(self, a, b):
        a, b = self.rev(a), self.rev(b)
        ancs = self._commonancestorsheads(a, b)
        return pycompat.maplist(self.node, ancs)

>      2. Use it in `revlog.descendant`
>      3. Make `revlog.isancestor` use `revlog.descendant`
>
> Why is that better than making descendant call isancestor?
isancestor uses nodes, so we first need to convert the node to a rev,
and then call 'revlog.descendant'

--
Paul Morelle
via Mercurial-devel - June 29, 2018, 7:51 p.m.
On Fri, Jun 29, 2018 at 12:42 PM Paul Morelle <paul.morelle@octobus.net>
wrote:

> On 27/06/18 19:42, Martin von Zweigbergk wrote:
>
>
>
> On Mon, Jun 25, 2018 at 9:54 AM Paul Morelle <paul.morelle@octobus.net>
> wrote:
>
>> On 21/06/18 18:30, Martin von Zweigbergk wrote:
>>
>> On Thu, Jun 21, 2018 at 7:24 AM Paul Morelle <paul.morelle@octobus.net>
>> wrote:
>>
>>> # HG changeset patch
>>> # User Boris Feld <boris.feld@octobus.net>
>>> # Date 1529585081 -3600
>>> #      Thu Jun 21 13:44:41 2018 +0100
>>> # Node ID 59fea52e54e014722486f7c049e192fa505032d8
>>> # Parent  8d20b1b4b6a0297e7f9640d285b15a5d6647369e
>>> # EXP-Topic descendant
>>> # Available At https://bitbucket.org/octobus/mercurial-devel/
>>> #              hg pull https://bitbucket.org/octobus/mercurial-devel/
>>> -r 59fea52e54e0
>>> revlog: do inclusive descendant testing (API)
>>>
>>> In many other places, a revision is considered a descendant of itself.
>>> We update the behavior of `revlog.descendant()` to match this.
>>>
>>> No test breaks, so it seems safe to do so.
>>>
>>> diff -r 8d20b1b4b6a0 -r 59fea52e54e0 mercurial/revlog.py
>>> --- a/mercurial/revlog.py       Thu Jun 21 13:32:07 2018 +0100
>>> +++ b/mercurial/revlog.py       Thu Jun 21 13:44:41 2018 +0100
>>> @@ -1378,7 +1378,7 @@
>>>      def descendant(self, start, end):
>>>          if start == nullrev:
>>>              return True
>>> -        return start in self.ancestors([end])
>>> +        return start in self.ancestors([end], inclusive=True)
>>>
>>
>> Is this now equivalent to self.isancestor(start, end)? That method relies
>> on commonancestorsheads instead of lazyancestors. What are the performance
>> trade-offs? Equivalent both when there are many ancestors and when there
>> are many descendants?
>>
>> Hello Martin,
>>
>> Interestingly, it turns out that we have the following flock of functions:
>>
>>    - ancestors: commonancestorsheads(parent_func, *revs)
>>       - uses revnum
>>       - any number of arguments
>>       - written in Python
>>    - cext/revlog.c: revlog.index.commonancestorsheads(*revs)
>>       - uses revnum
>>       - any number of arguments
>>       - written in C
>>    - revlog: revlog.commonancestorsheads(node-a, node-b)
>>       - uses nodes
>>       - takes exactly two nodes
>>       - Calls either self.index.c…a…heads  or ancestors.c…a…heads
>>    - revlog: revlog.isancestor(anc, desc)
>>       - uses nodes
>>       - calls revlog.commonancestorsheads
>>    - revlog: revlog.descendant(rev-a, rev-b)
>>       - uses revs
>>       - has it own very slow code
>>    - revlog: revlog.descendant(rev-a, rev-b)
>>       - uses revs
>>       - has it own very slow code
>>       - non-inclusive
>>    - context: context.descendant(other)
>>       - uses contexts
>>       - calls revlog.descendant
>>       - non-inclusive
>>
>> At the algorithm level, `anc in ancestors(desc)` will be faster when anc
>> is not an ancestor of desc (or they are many gca), since it will finish
>> sooner. However given `commonancestorheads` benefits from a C
>> implementation, it is currently the fastest option.
>>
>> In short terms, I think the following actions would make sense:
>>
>>    1. Extract a lower level `revlog._commonancestorheads(*revs)` from
>>    `revlog.commonancestorsheads`
>>
>> What would the new function do? commonancestorsheads is already pretty
> short. Do you mean to just make it work on revnums instead of nodeids? Or
> do you mean for making it optionally inclusive?
>
> The new function would operate directly on revision numbers instead of
> converting from/to nodes. The code of revlog.commonancestorsheads would
> then look like this:
>

Sounds good. You can also update rebase.py to use the new function then.


>
>     def commonancestorsheads(self, a, b):
>         a, b = self.rev(a), self.rev(b)
>         ancs = self._commonancestorsheads(a, b)
>         return pycompat.maplist(self.node, ancs)
>
>
>>    1. Use it in `revlog.descendant`
>>    2. Make `revlog.isancestor` use `revlog.descendant`
>>
>> Why is that better than making descendant call isancestor?
>
> isancestor uses nodes, so we first need to convert the node to a rev, and
> then call 'revlog.descendant'
>

I think that makes sense, thanks.


>
> --
> Paul Morelle
>

Patch

diff -r 8d20b1b4b6a0 -r 59fea52e54e0 mercurial/revlog.py
--- a/mercurial/revlog.py	Thu Jun 21 13:32:07 2018 +0100
+++ b/mercurial/revlog.py	Thu Jun 21 13:44:41 2018 +0100
@@ -1378,7 +1378,7 @@ 
     def descendant(self, start, end):
         if start == nullrev:
             return True
-        return start in self.ancestors([end])
+        return start in self.ancestors([end], inclusive=True)
 
     def commonancestorsheads(self, a, b):
         """calculate all the heads of the common ancestors of nodes a and b"""