Patchwork revert: remove set(mf) because it's O(repo)

login
register
mail settings
Submitter Durham Goode
Date March 2, 2017, 4:01 a.m.
Message ID <a8458fe51a9d155f1dae.1488427312@dev111.prn1.facebook.com>
Download mbox | patch
Permalink /patch/18863/
State Accepted
Delegated to: Martin von Zweigbergk
Headers show

Comments

Durham Goode - March 2, 2017, 4:01 a.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1488426665 28800
#      Wed Mar 01 19:51:05 2017 -0800
# Node ID a8458fe51a9d155f1daeaffdcf503e674d4d4588
# Parent  b787c41767339158927232ec7a9092196e887453
revert: remove set(mf) because it's O(repo)

The revert code had a 'set(manifest)' line in it, which has a runtime equivalent
to the size of the working copy. With alternative manifest implementations, like
treemanifest, this can be extra expensive. Let's rewrite it to be O(changes)
instead of O(working copy size).
via Mercurial-devel - March 2, 2017, 5:57 p.m.
On Wed, Mar 1, 2017 at 8:01 PM, Durham Goode <durham@fb.com> wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1488426665 28800
> #      Wed Mar 01 19:51:05 2017 -0800
> # Node ID a8458fe51a9d155f1daeaffdcf503e674d4d4588
> # Parent  b787c41767339158927232ec7a9092196e887453
> revert: remove set(mf) because it's O(repo)

O(repo) sounds misleading to me, because it sounds like number of
revisions matter too. I'd change it to O(manifest)

>
> The revert code had a 'set(manifest)' line in it, which has a runtime equivalent
> to the size of the working copy. With alternative manifest implementations, like
> treemanifest, this can be extra expensive. Let's rewrite it to be O(changes)
> instead of O(working copy size).

Thanks! This was one of the remaining few pieces I never got around to
fixing two years ago (the other place is copies.py). Now that I see
your patch, I wonder why I didn't see that simple solution.

>
> diff --git a/mercurial/cmdutil.py b/mercurial/cmdutil.py
> --- a/mercurial/cmdutil.py
> +++ b/mercurial/cmdutil.py
> @@ -2977,10 +2977,12 @@ def revert(ui, repo, ctx, parents, *pats
>          modadded = set()
>
>          # split between files known in target manifest and the others

Does it make sense to leave this comment here? I'm not sure what its
scope is meant to be even before this patch.

> -        smf = set(mf)
>
>          # determine the exact nature of the deleted changesets
> -        deladded = _deleted - smf
> +        deladded = set(_deleted)
> +        for path in _deleted:
> +            if path in mf:
> +                deladded.remove(path)

Would it be better to not add it to the set in the first place?  Like so:

> -        deladded = _deleted - smf
> +        deladded = set()
> +        for path in _deleted:
> +            if path not in mf:
> +                deladded.add(path)

I can't see it making any difference in practice, so pick whichever
seems simpler (i.e. feel free to ignore).

>          deleted = _deleted - deladded
>
>          # We need to account for the state of the file in the dirstate,
> @@ -3024,7 +3026,10 @@ def revert(ui, repo, ctx, parents, *pats
>          # in case of merge, files that are actually added can be reported as
>          # modified, we need to post process the result
>          if p2 != nullid:
> -            mergeadd = dsmodified - smf
> +            mergeadd = set(dsmodified)
> +            for path in dsmodified:
> +                if path in mf:
> +                    mergeadd.remove(path)

Same here.

>              dsadded |= mergeadd
>              dsmodified -= mergeadd
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Durham Goode - March 2, 2017, 6:21 p.m.
On 3/2/17 9:57 AM, Martin von Zweigbergk wrote:
> On Wed, Mar 1, 2017 at 8:01 PM, Durham Goode <durham@fb.com> wrote:
>> # HG changeset patch
>> # User Durham Goode <durham@fb.com>
>> # Date 1488426665 28800
>> #      Wed Mar 01 19:51:05 2017 -0800
>> # Node ID a8458fe51a9d155f1daeaffdcf503e674d4d4588
>> # Parent  b787c41767339158927232ec7a9092196e887453
>> revert: remove set(mf) because it's O(repo)
>
> O(repo) sounds misleading to me, because it sounds like number of
> revisions matter too. I'd change it to O(manifest)
>
>>
>> The revert code had a 'set(manifest)' line in it, which has a runtime equivalent
>> to the size of the working copy. With alternative manifest implementations, like
>> treemanifest, this can be extra expensive. Let's rewrite it to be O(changes)
>> instead of O(working copy size).
>
> Thanks! This was one of the remaining few pieces I never got around to
> fixing two years ago (the other place is copies.py). Now that I see
> your patch, I wonder why I didn't see that simple solution.
>
>>
>> diff --git a/mercurial/cmdutil.py b/mercurial/cmdutil.py
>> --- a/mercurial/cmdutil.py
>> +++ b/mercurial/cmdutil.py
>> @@ -2977,10 +2977,12 @@ def revert(ui, repo, ctx, parents, *pats
>>          modadded = set()
>>
>>          # split between files known in target manifest and the others
>
> Does it make sense to leave this comment here? I'm not sure what its
> scope is meant to be even before this patch.

Me either, which is why I left it... We can probably remove it though. 
Want me to resend, or you want to just drop this line during queueing, 
since I don't think I'm changing the bits below.

>> -        smf = set(mf)
>>
>>          # determine the exact nature of the deleted changesets
>> -        deladded = _deleted - smf
>> +        deladded = set(_deleted)
>> +        for path in _deleted:
>> +            if path in mf:
>> +                deladded.remove(path)
>
> Would it be better to not add it to the set in the first place?  Like so:
>
>> -        deladded = _deleted - smf
>> +        deladded = set()
>> +        for path in _deleted:
>> +            if path not in mf:
>> +                deladded.add(path)
>
> I can't see it making any difference in practice, so pick whichever
> seems simpler (i.e. feel free to ignore).

The set constructor does a smart allocation when doing a set copy, so we 
avoid internal set-growth reallocations by starting big and going small. 
  My assumption is that this stuff is going to be pretty small 
regardless, so it probably doesn't matter really.

See line 660 
http://svn.python.org/view/python/trunk/Objects/setobject.c?view=markup

>>          deleted = _deleted - deladded
>>
>>          # We need to account for the state of the file in the dirstate,
>> @@ -3024,7 +3026,10 @@ def revert(ui, repo, ctx, parents, *pats
>>          # in case of merge, files that are actually added can be reported as
>>          # modified, we need to post process the result
>>          if p2 != nullid:
>> -            mergeadd = dsmodified - smf
>> +            mergeadd = set(dsmodified)
>> +            for path in dsmodified:
>> +                if path in mf:
>> +                    mergeadd.remove(path)
>
> Same here.
>
>>              dsadded |= mergeadd
>>              dsmodified -= mergeadd
>>
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel@mercurial-scm.org
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mercurial-2Dscm.org_mailman_listinfo_mercurial-2Ddevel&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=nuarHzhP1wi1T9iURRCj1A&m=uZTCRgAVm8p5eoX9JtG7NXyHYRWAtyDglOxpwq5zkdk&s=PC6LNzvhXaSgoieJBJpN2q_sfLvi5N6IBK0Iv9VZcCg&e=
via Mercurial-devel - March 2, 2017, 6:37 p.m.
On Thu, Mar 2, 2017 at 10:21 AM, Durham Goode <durham@fb.com> wrote:
>
>
> On 3/2/17 9:57 AM, Martin von Zweigbergk wrote:
>>
>> On Wed, Mar 1, 2017 at 8:01 PM, Durham Goode <durham@fb.com> wrote:
>>>
>>> # HG changeset patch
>>> # User Durham Goode <durham@fb.com>
>>> # Date 1488426665 28800
>>> #      Wed Mar 01 19:51:05 2017 -0800
>>> # Node ID a8458fe51a9d155f1daeaffdcf503e674d4d4588
>>> # Parent  b787c41767339158927232ec7a9092196e887453
>>> revert: remove set(mf) because it's O(repo)
>>
>>
>> O(repo) sounds misleading to me, because it sounds like number of
>> revisions matter too. I'd change it to O(manifest)
>>
>>>
>>> The revert code had a 'set(manifest)' line in it, which has a runtime
>>> equivalent
>>> to the size of the working copy. With alternative manifest
>>> implementations, like
>>> treemanifest, this can be extra expensive. Let's rewrite it to be
>>> O(changes)
>>> instead of O(working copy size).
>>
>>
>> Thanks! This was one of the remaining few pieces I never got around to
>> fixing two years ago (the other place is copies.py). Now that I see
>> your patch, I wonder why I didn't see that simple solution.
>>
>>>
>>> diff --git a/mercurial/cmdutil.py b/mercurial/cmdutil.py
>>> --- a/mercurial/cmdutil.py
>>> +++ b/mercurial/cmdutil.py
>>> @@ -2977,10 +2977,12 @@ def revert(ui, repo, ctx, parents, *pats
>>>          modadded = set()
>>>
>>>          # split between files known in target manifest and the others
>>
>>
>> Does it make sense to leave this comment here? I'm not sure what its
>> scope is meant to be even before this patch.
>
>
> Me either, which is why I left it... We can probably remove it though. Want
> me to resend, or you want to just drop this line during queueing, since I
> don't think I'm changing the bits below.

I'll drop it in flight. I'm also changing the O(repo) to be
O(manifest) in the subject line. Thanks!

>
>>> -        smf = set(mf)
>>>
>>>          # determine the exact nature of the deleted changesets
>>> -        deladded = _deleted - smf
>>> +        deladded = set(_deleted)
>>> +        for path in _deleted:
>>> +            if path in mf:
>>> +                deladded.remove(path)
>>
>>
>> Would it be better to not add it to the set in the first place?  Like so:
>>
>>> -        deladded = _deleted - smf
>>> +        deladded = set()
>>> +        for path in _deleted:
>>> +            if path not in mf:
>>> +                deladded.add(path)
>>
>>
>> I can't see it making any difference in practice, so pick whichever
>> seems simpler (i.e. feel free to ignore).
>
>
> The set constructor does a smart allocation when doing a set copy, so we
> avoid internal set-growth reallocations by starting big and going small.  My
> assumption is that this stuff is going to be pretty small regardless, so it
> probably doesn't matter really.
>
> See line 660
> http://svn.python.org/view/python/trunk/Objects/setobject.c?view=markup
>
>>>          deleted = _deleted - deladded
>>>
>>>          # We need to account for the state of the file in the dirstate,
>>> @@ -3024,7 +3026,10 @@ def revert(ui, repo, ctx, parents, *pats
>>>          # in case of merge, files that are actually added can be
>>> reported as
>>>          # modified, we need to post process the result
>>>          if p2 != nullid:
>>> -            mergeadd = dsmodified - smf
>>> +            mergeadd = set(dsmodified)
>>> +            for path in dsmodified:
>>> +                if path in mf:
>>> +                    mergeadd.remove(path)
>>
>>
>> Same here.
>>
>>>              dsadded |= mergeadd
>>>              dsmodified -= mergeadd
>>>
>>> _______________________________________________
>>> Mercurial-devel mailing list
>>> Mercurial-devel@mercurial-scm.org
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mercurial-2Dscm.org_mailman_listinfo_mercurial-2Ddevel&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=nuarHzhP1wi1T9iURRCj1A&m=uZTCRgAVm8p5eoX9JtG7NXyHYRWAtyDglOxpwq5zkdk&s=PC6LNzvhXaSgoieJBJpN2q_sfLvi5N6IBK0Iv9VZcCg&e=

Patch

diff --git a/mercurial/cmdutil.py b/mercurial/cmdutil.py
--- a/mercurial/cmdutil.py
+++ b/mercurial/cmdutil.py
@@ -2977,10 +2977,12 @@  def revert(ui, repo, ctx, parents, *pats
         modadded = set()
 
         # split between files known in target manifest and the others
-        smf = set(mf)
 
         # determine the exact nature of the deleted changesets
-        deladded = _deleted - smf
+        deladded = set(_deleted)
+        for path in _deleted:
+            if path in mf:
+                deladded.remove(path)
         deleted = _deleted - deladded
 
         # We need to account for the state of the file in the dirstate,
@@ -3024,7 +3026,10 @@  def revert(ui, repo, ctx, parents, *pats
         # in case of merge, files that are actually added can be reported as
         # modified, we need to post process the result
         if p2 != nullid:
-            mergeadd = dsmodified - smf
+            mergeadd = set(dsmodified)
+            for path in dsmodified:
+                if path in mf:
+                    mergeadd.remove(path)
             dsadded |= mergeadd
             dsmodified -= mergeadd