Patchwork [6,of,6] revset: fix a crash with 'roots(wdir())'

login
register
mail settings
Submitter Matt Harbison
Date July 1, 2015, 2:56 a.m.
Message ID <c3aecbbe50596e54402e.1435719403@Envy>
Download mbox | patch
Permalink /patch/9840/
State Changes Requested
Headers show

Comments

Matt Harbison - July 1, 2015, 2:56 a.m.
# HG changeset patch
# User Matt Harbison <matt_harbison@yahoo.com>
# Date 1435715342 14400
#      Tue Jun 30 21:49:02 2015 -0400
# Node ID c3aecbbe50596e54402e4bd98931d33f1b2d5d01
# Parent  e26c2caf800020627f58e45f799ce932c6aee761
revset: fix a crash with 'roots(wdir())'

The crash was "TypeError: expected string or Unicode object, NoneType found"
down in revlog.parentrevs().
Pierre-Yves David - July 1, 2015, 3:24 a.m.
On 06/30/2015 07:56 PM, Matt Harbison wrote:
> # HG changeset patch
> # User Matt Harbison <matt_harbison@yahoo.com>
> # Date 1435715342 14400
> #      Tue Jun 30 21:49:02 2015 -0400
> # Node ID c3aecbbe50596e54402e4bd98931d33f1b2d5d01
> # Parent  e26c2caf800020627f58e45f799ce932c6aee761
> revset: fix a crash with 'roots(wdir())'

This series worries me about two things:

1) performance impact, can we get than benchmarked

2) this is a lot of churn using None, could we settle for another value 
(max int) and move toward that instead?
Matt Harbison - July 1, 2015, 3:31 a.m.
On Tue, 30 Jun 2015 23:24:32 -0400, Pierre-Yves David  
<pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 06/30/2015 07:56 PM, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison <matt_harbison@yahoo.com>
>> # Date 1435715342 14400
>> #      Tue Jun 30 21:49:02 2015 -0400
>> # Node ID c3aecbbe50596e54402e4bd98931d33f1b2d5d01
>> # Parent  e26c2caf800020627f58e45f799ce932c6aee761
>> revset: fix a crash with 'roots(wdir())'
>
> This series worries me about two things:
>
> 1) performance impact, can we get than benchmarked

I've seen you do benchmarking patches before, but admittedly didn't pay  
much attention to what you were doing or how to benchmark.  Can you point  
me to some specific steps to perform?

> 2) this is a lot of churn using None, could we settle for another value  
> (max int) and move toward that instead?

https://www.selenic.com/pipermail/mercurial-devel/2015-June/071656.html
Pierre-Yves David - July 1, 2015, 3:51 a.m.
On 06/30/2015 08:31 PM, Matt Harbison wrote:
> On Tue, 30 Jun 2015 23:24:32 -0400, Pierre-Yves David
> <pierre-yves.david@ens-lyon.org> wrote:
>
>>
>>
>> On 06/30/2015 07:56 PM, Matt Harbison wrote:
>>> # HG changeset patch
>>> # User Matt Harbison <matt_harbison@yahoo.com>
>>> # Date 1435715342 14400
>>> #      Tue Jun 30 21:49:02 2015 -0400
>>> # Node ID c3aecbbe50596e54402e4bd98931d33f1b2d5d01
>>> # Parent  e26c2caf800020627f58e45f799ce932c6aee761
>>> revset: fix a crash with 'roots(wdir())'
>>
>> This series worries me about two things:
>>
>> 1) performance impact, can we get than benchmarked
>
> I've seen you do benchmarking patches before, but admittedly didn't pay
> much attention to what you were doing or how to benchmark.  Can you
> point me to some specific steps to perform?

Sure,

The benchmark runner is in contrib/revsetbenchmarks.py and should be 
documented (come back with any question you have, so we can improve the 
doc) there is file with "canonical" revset in "*-revsets.txt" they 
should be self documenting too.

(thanks for looking at this)
Yuya Nishihara - July 1, 2015, 1:01 p.m.
On Tue, 30 Jun 2015 23:31:32 -0400, Matt Harbison wrote:
> On Tue, 30 Jun 2015 23:24:32 -0400, Pierre-Yves David  
> <pierre-yves.david@ens-lyon.org> wrote:
> > On 06/30/2015 07:56 PM, Matt Harbison wrote:
> >> # HG changeset patch
> >> # User Matt Harbison <matt_harbison@yahoo.com>
> >> # Date 1435715342 14400
> >> #      Tue Jun 30 21:49:02 2015 -0400
> >> # Node ID c3aecbbe50596e54402e4bd98931d33f1b2d5d01
> >> # Parent  e26c2caf800020627f58e45f799ce932c6aee761
> >> revset: fix a crash with 'roots(wdir())'
> >
> > This series worries me about two things:
> >
> > 1) performance impact, can we get than benchmarked
> 
> I've seen you do benchmarking patches before, but admittedly didn't pay  
> much attention to what you were doing or how to benchmark.  Can you point  
> me to some specific steps to perform?
> 
> > 2) this is a lot of churn using None, could we settle for another value  
> > (max int) and move toward that instead?
> 
> https://www.selenic.com/pipermail/mercurial-devel/2015-June/071656.html

It means all "None"s in this series will have to be replaced by wdirrev.
Perhaps I should send my wdirrev patches soon.

I want to add one more thing:

3) can we have a function that is as fast as cl.parentrevs(r) and can process
   wdir() ?
Matt Harbison - July 2, 2015, 1:09 a.m.
On Tue, 30 Jun 2015 23:51:51 -0400, Pierre-Yves David  
<pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 06/30/2015 08:31 PM, Matt Harbison wrote:
>> On Tue, 30 Jun 2015 23:24:32 -0400, Pierre-Yves David
>> <pierre-yves.david@ens-lyon.org> wrote:
>>
>>>
>>>
>>> On 06/30/2015 07:56 PM, Matt Harbison wrote:
>>>> # HG changeset patch
>>>> # User Matt Harbison <matt_harbison@yahoo.com>
>>>> # Date 1435715342 14400
>>>> #      Tue Jun 30 21:49:02 2015 -0400
>>>> # Node ID c3aecbbe50596e54402e4bd98931d33f1b2d5d01
>>>> # Parent  e26c2caf800020627f58e45f799ce932c6aee761
>>>> revset: fix a crash with 'roots(wdir())'
>>>
>>> This series worries me about two things:
>>>
>>> 1) performance impact, can we get than benchmarked
>>
>> I've seen you do benchmarking patches before, but admittedly didn't pay
>> much attention to what you were doing or how to benchmark.  Can you
>> point me to some specific steps to perform?
>
> Sure,
>
> The benchmark runner is in contrib/revsetbenchmarks.py and should be  
> documented (come back with any question you have, so we can improve the  
> doc) there is file with "canonical" revset in "*-revsets.txt" they  
> should be self documenting too.
>
> (thanks for looking at this)

I ran this:

    ./contrib/revsetbenchmarks.py -f contrib/all-revsets.txt c76e8d14383a  
tip

It looks like it flags the following:

#26: children(tip~100)  rev..rst      106%
#55: :10000 and draft() reverse       105%

Not sure why 34 errors out on OS X.  A second run had both at 105%, so it  
wasn't a system hiccup, but the raw numbers seem small.  I can paste the  
whole thing if you want, but it wraps and doesn't line up...


Result by revset
================

Revision:
0) Revision  25788:c76e8d14383a: merge with stable
1) Revision  (tip) 25794:172d6d986ffa: revset: fix a crash with  
'roots(wdir())'

revset #26: children(tip~100)
    plain         min           max           first         last           
reverse       rev..rst      rev..ast      sort          sor..rst       
sor..ast
0) 0.006284      0.005795      0.005781      0.006023      0.005848       
0.005667      0.005655      0.006309      0.005788      0.006040       
0.005840
1) 0.006154      0.005787      0.005791      0.006062      0.005805       
0.005782      0.006006 106% 0.006349      0.005813      0.006033       
0.006106

revset #55: :10000 and draft()
    plain         min           max           first         last           
reverse       rev..rst      rev..ast      sort          sor..rst       
sor..ast
0) 0.002676      0.002839      0.002840      0.002791      0.002730       
0.002700      0.003087      0.002979      0.002697      0.002876       
0.002825
1) 0.002698      0.002843      0.002845      0.002901      0.002858       
0.002841 105% 0.003001      0.002972      0.002816      0.002868       
0.002744
Matt Harbison - July 2, 2015, 1:28 a.m.
On Wed, 01 Jul 2015 09:01:17 -0400, Yuya Nishihara <yuya@tcha.org> wrote:

> On Tue, 30 Jun 2015 23:31:32 -0400, Matt Harbison wrote:
>> On Tue, 30 Jun 2015 23:24:32 -0400, Pierre-Yves David
>> <pierre-yves.david@ens-lyon.org> wrote:
>> > On 06/30/2015 07:56 PM, Matt Harbison wrote:
>> >> # HG changeset patch
>> >> # User Matt Harbison <matt_harbison@yahoo.com>
>> >> # Date 1435715342 14400
>> >> #      Tue Jun 30 21:49:02 2015 -0400
>> >> # Node ID c3aecbbe50596e54402e4bd98931d33f1b2d5d01
>> >> # Parent  e26c2caf800020627f58e45f799ce932c6aee761
>> >> revset: fix a crash with 'roots(wdir())'
>> >
>> > This series worries me about two things:
>> >
>> > 1) performance impact, can we get than benchmarked
>>
>> I've seen you do benchmarking patches before, but admittedly didn't pay
>> much attention to what you were doing or how to benchmark.  Can you  
>> point
>> me to some specific steps to perform?
>>
>> > 2) this is a lot of churn using None, could we settle for another  
>> value
>> > (max int) and move toward that instead?
>>
>> https://www.selenic.com/pipermail/mercurial-devel/2015-June/071656.html
>
> It means all "None"s in this series will have to be replaced by wdirrev.
> Perhaps I should send my wdirrev patches soon.

I'll set this aside for now then- they aren't currently an issue for me.   
I just seemed worth fixing all of the changelog related errors after  
stumbling on it in heads().

> I want to add one more thing:
>
> 3) can we have a function that is as fast as cl.parentrevs(r) and can  
> process
>    wdir() ?

Something like this in revset.py?  It looks like an optimization to not go  
through context.

def _parentrevs(repo, cl, r):
     if r is not None:
         return cl.parentrevs(r)
     else:
         return [p.rev() for p in repo[r].parents()]

The only reason I didn't do it was because with the local changelog  
caching optimization, I assumed calling a function over and over would be  
unacceptable overhead.
Yuya Nishihara - July 2, 2015, 3:02 p.m.
On Wed, 01 Jul 2015 21:28:47 -0400, Matt Harbison wrote:
> On Wed, 01 Jul 2015 09:01:17 -0400, Yuya Nishihara <yuya@tcha.org> wrote:
> > 3) can we have a function that is as fast as cl.parentrevs(r) and can  
> > process
> >    wdir() ?
> 
> Something like this in revset.py?  It looks like an optimization to not go  
> through context.
> 
> def _parentrevs(repo, cl, r):
>      if r is not None:
>          return cl.parentrevs(r)
>      else:
>          return [p.rev() for p in repo[r].parents()]

I don't have a good idea, but something like 'repo.parentrevs()'. I know we
shouldn't add new function to localrepo, but it will have to access both
changelog and dirstate to process the wdir revision seamlessly.

> The only reason I didn't do it was because with the local changelog  
> caching optimization, I assumed calling a function over and over would be  
> unacceptable overhead.

Probably yes. revset functions avoid extra name lookup and function calls in
heavy loop.
Yuya Nishihara - July 4, 2015, 1:29 p.m.
On Fri, 03 Jul 2015 09:51:31 -0700, Pierre-Yves David wrote:
> On 07/01/2015 06:28 PM, Matt Harbison wrote:
> > On Wed, 01 Jul 2015 09:01:17 -0400, Yuya Nishihara <yuya@tcha.org> wrote:
> >> 3) can we have a function that is as fast as cl.parentrevs(r) and can
> >> process
> >>    wdir() ?
> >
> > Something like this in revset.py?  It looks like an optimization to not
> > go through context.
> >
> > def _parentrevs(repo, cl, r):
> >      if r is not None:
> >          return cl.parentrevs(r)
> >      else:
> >          return [p.rev() for p in repo[r].parents()]
> >
> > The only reason I didn't do it was because with the local changelog
> > caching optimization, I assumed calling a function over and over would
> > be unacceptable overhead.

It could be a generator to minimize the number of repo.changelog calls, but
I agree it would be still slower than the inlined version.

  def _iterparentrevs(repo, subset):
      cl = repo.changelog
      for r in subset:
          if r is not None:
              yield cl.parentrevs(r)
          else:
              yield [p.rev() for p in repo[r].parents()]

> So I think we should have the API not panic when handed and 
> 'wdirrev'/'wdirid' (the same way we probably want to detect 'nullrev'/….
> 
> I see possibles path for this:
> 
> 1) Have the low level API support the 'wdirrev' by providing them some 
> access to the dirstate data. That would be the "best" as it would ensure 
> transparent support for 'wdir' everywhere (mostly). However, this would 
> probably requires a massive number of layer violation and mudding.
> 
> 2) Have low level API detect 'wdirrev' case and "ignore" it, issue a 
> devel-warn for developer to spot such case. This will provide a middle 
> ground between full support (1) and plain crash (what we have now).

I think (1) would be the hard way. My current idea is similar to (2), but
raising exception instead of devel-warning which needs 'ui' object.

  cl.node(wdirrev) -> wdirid
  cl.rev(wdirnode) -> wdirrev
  cl.parentrevs(wdirrev) -> UnsupportedNodeError

Then, cl.parentrevs(r) will be able to fall back to repo[r].parents():

  try:
      ps.add(cl.parentrevs(r)[0])
  except error.UnsupportedNodeError:
      ps.add(repo[r].p1().rev())

The overhead will be minimal because 'wdir()' will be rarely used.

> 3) As you guys have been working on it more closely than I did, you will 
> likely comes with another better solution.
> 
> 4) I got convinced that my worries are groundless.

I'm sure (4) never ever happen. ;)
Pierre-Yves David - July 18, 2015, 3:16 p.m.
On 07/04/2015 03:29 PM, Yuya Nishihara wrote:
> On Fri, 03 Jul 2015 09:51:31 -0700, Pierre-Yves David wrote:
>> On 07/01/2015 06:28 PM, Matt Harbison wrote:
>>> On Wed, 01 Jul 2015 09:01:17 -0400, Yuya Nishihara <yuya@tcha.org> wrote:
>>>> 3) can we have a function that is as fast as cl.parentrevs(r) and can
>>>> process
>>>>     wdir() ?
>>>
>>> Something like this in revset.py?  It looks like an optimization to not
>>> go through context.
>>>
>>> def _parentrevs(repo, cl, r):
>>>       if r is not None:
>>>           return cl.parentrevs(r)
>>>       else:
>>>           return [p.rev() for p in repo[r].parents()]
>>>
>>> The only reason I didn't do it was because with the local changelog
>>> caching optimization, I assumed calling a function over and over would
>>> be unacceptable overhead.
>
> It could be a generator to minimize the number of repo.changelog calls, but
> I agree it would be still slower than the inlined version.
>
>    def _iterparentrevs(repo, subset):
>        cl = repo.changelog
>        for r in subset:
>            if r is not None:
>                yield cl.parentrevs(r)
>            else:
>                yield [p.rev() for p in repo[r].parents()]
>
>> So I think we should have the API not panic when handed and
>> 'wdirrev'/'wdirid' (the same way we probably want to detect 'nullrev'/….
>>
>> I see possibles path for this:
>>
>> 1) Have the low level API support the 'wdirrev' by providing them some
>> access to the dirstate data. That would be the "best" as it would ensure
>> transparent support for 'wdir' everywhere (mostly). However, this would
>> probably requires a massive number of layer violation and mudding.
>>
>> 2) Have low level API detect 'wdirrev' case and "ignore" it, issue a
>> devel-warn for developer to spot such case. This will provide a middle
>> ground between full support (1) and plain crash (what we have now).
>
> I think (1) would be the hard way. My current idea is similar to (2), but
> raising exception instead of devel-warning which needs 'ui' object.
>
>    cl.node(wdirrev) -> wdirid
>    cl.rev(wdirnode) -> wdirrev
>    cl.parentrevs(wdirrev) -> UnsupportedNodeError

The goal here would be to avoid to crash (crash to the user are bad) in 
multiple place where people will be using old code (especially 
extensions). Raising an exception would not help in this regard (except 
it would be a dedicated error)

But as you pointed, cl have neither a repo or a ui, which limit our 
options a lot,

> Then, cl.parentrevs(r) will be able to fall back to repo[r].parents():
>
>    try:
>        ps.add(cl.parentrevs(r)[0])
>    except error.UnsupportedNodeError:
>        ps.add(repo[r].p1().rev())
>
> The overhead will be minimal because 'wdir()' will be rarely used.

Setting up the exception catching will likely have an overhead:

no-except: 1.03 ms

   for i in xrange(2**16)
     i

with-except: 1.57 ms

   for i in xrange(2**16)
     try:
       i
     except:
       pass

>
>> 3) As you guys have been working on it more closely than I did, you will
>> likely comes with another better solution.
>>
>> 4) I got convinced that my worries are groundless.
>
> I'm sure (4) never ever happen. ;)
Yuya Nishihara - July 20, 2015, 11:46 a.m.
On Sat, 18 Jul 2015 17:16:14 +0200, Pierre-Yves David wrote:
> On 07/04/2015 03:29 PM, Yuya Nishihara wrote:
> > I think (1) would be the hard way. My current idea is similar to (2), but
> > raising exception instead of devel-warning which needs 'ui' object.
> >
> >    cl.node(wdirrev) -> wdirid
> >    cl.rev(wdirnode) -> wdirrev
> >    cl.parentrevs(wdirrev) -> UnsupportedNodeError
> 
> The goal here would be to avoid to crash (crash to the user are bad) in
> multiple place where people will be using old code (especially
> extensions). Raising an exception would not help in this regard (except
> it would be a dedicated error)

Perhaps it can be caught at dispatch().

We might want to translate it to devel-warning at revset.getset(), but I'm
unsure if it go well because revsets are evaluated lazily.

> But as you pointed, cl have neither a repo or a ui, which limit our
> options a lot,
> 
> > Then, cl.parentrevs(r) will be able to fall back to repo[r].parents():
> >
> >    try:
> >        ps.add(cl.parentrevs(r)[0])
> >    except error.UnsupportedNodeError:
> >        ps.add(repo[r].p1().rev())
> >
> > The overhead will be minimal because 'wdir()' will be rarely used.
> 
> Setting up the exception catching will likely have an overhead:
> 
> no-except: 1.03 ms
> 
>    for i in xrange(2**16)
>      i
> 
> with-except: 1.57 ms
> 
>    for i in xrange(2**16)
>      try:
>        i
>      except:
>        pass

Yes, there are two extra opcodes in hot loop.

  (without try-except)

        >>   13 FOR_ITER                10 (to 26)
             16 STORE_FAST               0 (i)
  3          19 LOAD_FAST                0 (i)
             22 POP_TOP
             23 JUMP_ABSOLUTE           13

  (with try-except)

        >>   13 FOR_ITER                24 (to 40)
             16 STORE_FAST               0 (i)
  3          19 SETUP_EXCEPT             8 (to 30)
  4          22 LOAD_FAST                0 (i)
             25 POP_TOP
             26 POP_BLOCK
             27 JUMP_ABSOLUTE           13

If we can't accept this overhead, we'll probably have to move try-except
out of the loop:

  try:
      for r in ...:
          ps.add(cl.parentrevs(r)[0])
  except error.UnsupportedNodeError:
      # pay 2x penalty if wdir exists
      for r in ...:
          try:
              ps.add(cl.parentrevs(r)[0])
          except error.UnsupportedNodeError:
              ps.add(repo[r].p1().rev())

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -1794,8 +1794,12 @@ 
     Changesets in set with no parent changeset in set.
     """
     s = getset(repo, fullreposet(repo), x)
-    parents = repo.changelog.parentrevs
+    parentrevs = repo.changelog.parentrevs
     def filter(r):
+        parents = parentrevs
+        if r is None:
+            parents = lambda r: [p.rev() for p in repo[r].parents()]
+
         for p in parents(r):
             if 0 <= p and p in s:
                 return False
diff --git a/tests/test-merge-default.t b/tests/test-merge-default.t
--- a/tests/test-merge-default.t
+++ b/tests/test-merge-default.t
@@ -102,6 +102,11 @@ 
   user:        test
   date:        * (glob)
   
+  $ hg log -r 'roots(wdir())'
+  changeset:   5:a431fabd6039+
+  user:        test
+  date:        * (glob)
+  
 Should succeed - 2 heads:
 
   $ hg merge -P