Patchwork pull: prevent duplicated entry in `op.pulledsubset`

login
register
mail settings
Submitter Pierre-Yves David
Date March 27, 2014, 1:58 a.m.
Message ID <5d859e949ce867db1439.1395885520@marginatus.alto.octopoid.net>
Download mbox | patch
Permalink /patch/4075/
State Accepted
Commit 09e7118715ebdb515a0b9bf87c00416d6c408842
Headers show

Comments

Pierre-Yves David - March 27, 2014, 1:58 a.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@fb.com>
# Date 1395874532 25200
#      Wed Mar 26 15:55:32 2014 -0700
# Node ID 5d859e949ce867db1439448f6ea2bb3745252768
# Parent  8c210b169c694f654414a977a59cb5e4c34eb48f
pull: prevent duplicated entry in `op.pulledsubset`

In the bare pull case we could add the same node multiple time to the
`pulloperation.pulledsubset`. Beside being a bit wrong this confused the new
revset implementation of `revset._revancestor` into giving bad result.

This changeset fix the pull operation part. The fix for the revset itself will
come in another changeset.
David Soria Parra - March 27, 2014, 5:18 p.m.
pierre-yves.david@ens-lyon.org writes:

> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@fb.com>
> # Date 1395874532 25200
> #      Wed Mar 26 15:55:32 2014 -0700
> # Node ID 5d859e949ce867db1439448f6ea2bb3745252768
> # Parent  8c210b169c694f654414a977a59cb5e4c34eb48f
> pull: prevent duplicated entry in `op.pulledsubset`
>
> In the bare pull case we could add the same node multiple time to the
> `pulloperation.pulledsubset`. Beside being a bit wrong this confused the new
> revset implementation of `revset._revancestor` into giving bad result.
>
> This changeset fix the pull operation part. The fix for the revset itself will
> come in another changeset.
>
> diff --git a/mercurial/exchange.py b/mercurial/exchange.py
> --- a/mercurial/exchange.py
> +++ b/mercurial/exchange.py
> @@ -407,11 +407,16 @@ class pulloperation(object):
>          """heads of the set of changeset target by the pull"""
>          # compute target subset
>          if self.heads is None:
>              # We pulled every thing possible
>              # sync on everything common
> -            return self.common + self.rheads
> +            c = set(self.common)
> +            ret = list(self.common)
> +            for n in self.rheads:
> +                if n not in c:
> +                    ret.append(n)
> +            return ret

isn't this equivalent to

  list(set(self.common + self.rheads))

Maybe my python isn't good enough, but are the performance implications
for list + set that problematic to do it ourself?
Simon King - March 27, 2014, 5:50 p.m.
On Thu, Mar 27, 2014 at 5:18 PM, David Soria Parra <davidsp@fb.com> wrote:
> pierre-yves.david@ens-lyon.org writes:
>
>> # HG changeset patch
>> # User Pierre-Yves David <pierre-yves.david@fb.com>
>> # Date 1395874532 25200
>> #      Wed Mar 26 15:55:32 2014 -0700
>> # Node ID 5d859e949ce867db1439448f6ea2bb3745252768
>> # Parent  8c210b169c694f654414a977a59cb5e4c34eb48f
>> pull: prevent duplicated entry in `op.pulledsubset`
>>
>> In the bare pull case we could add the same node multiple time to the
>> `pulloperation.pulledsubset`. Beside being a bit wrong this confused the new
>> revset implementation of `revset._revancestor` into giving bad result.
>>
>> This changeset fix the pull operation part. The fix for the revset itself will
>> come in another changeset.
>>
>> diff --git a/mercurial/exchange.py b/mercurial/exchange.py
>> --- a/mercurial/exchange.py
>> +++ b/mercurial/exchange.py
>> @@ -407,11 +407,16 @@ class pulloperation(object):
>>          """heads of the set of changeset target by the pull"""
>>          # compute target subset
>>          if self.heads is None:
>>              # We pulled every thing possible
>>              # sync on everything common
>> -            return self.common + self.rheads
>> +            c = set(self.common)
>> +            ret = list(self.common)
>> +            for n in self.rheads:
>> +                if n not in c:
>> +                    ret.append(n)
>> +            return ret
>
> isn't this equivalent to
>
>   list(set(self.common + self.rheads))
>
> Maybe my python isn't good enough, but are the performance implications
> for list + set that problematic to do it ourself?

Pierre-Yves' version preserves the order of items in self.common and
self.rheads, whereas yours would lose that order.

(I've no idea if that's important though...)

Simon
Pierre-Yves David - March 27, 2014, 5:56 p.m.
On 03/27/2014 10:50 AM, Simon King wrote:
> On Thu, Mar 27, 2014 at 5:18 PM, David Soria Parra <davidsp@fb.com> wrote:
>> pierre-yves.david@ens-lyon.org writes:
>>
>>> # HG changeset patch
>>> # User Pierre-Yves David <pierre-yves.david@fb.com>
>>> # Date 1395874532 25200
>>> #      Wed Mar 26 15:55:32 2014 -0700
>>> # Node ID 5d859e949ce867db1439448f6ea2bb3745252768
>>> # Parent  8c210b169c694f654414a977a59cb5e4c34eb48f
>>> pull: prevent duplicated entry in `op.pulledsubset`
>>>
>>> In the bare pull case we could add the same node multiple time to the
>>> `pulloperation.pulledsubset`. Beside being a bit wrong this confused the new
>>> revset implementation of `revset._revancestor` into giving bad result.
>>>
>>> This changeset fix the pull operation part. The fix for the revset itself will
>>> come in another changeset.
>>>
>>> diff --git a/mercurial/exchange.py b/mercurial/exchange.py
>>> --- a/mercurial/exchange.py
>>> +++ b/mercurial/exchange.py
>>> @@ -407,11 +407,16 @@ class pulloperation(object):
>>>           """heads of the set of changeset target by the pull"""
>>>           # compute target subset
>>>           if self.heads is None:
>>>               # We pulled every thing possible
>>>               # sync on everything common
>>> -            return self.common + self.rheads
>>> +            c = set(self.common)
>>> +            ret = list(self.common)
>>> +            for n in self.rheads:
>>> +                if n not in c:
>>> +                    ret.append(n)
>>> +            return ret
>>
>> isn't this equivalent to
>>
>>    list(set(self.common + self.rheads))
>>
>> Maybe my python isn't good enough, but are the performance implications
>> for list + set that problematic to do it ourself?
>
> Pierre-Yves' version preserves the order of items in self.common and
> self.rheads, whereas yours would lose that order.
>
> (I've no idea if that's important though...)

What Simon says. It is usually easier to preserve order in our list.

The whole thing should not be to performace critical anyway.
Matt Mackall - April 2, 2014, 12:14 a.m.
On Wed, 2014-03-26 at 18:58 -0700, pierre-yves.david@ens-lyon.org wrote:
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@fb.com>
> # Date 1395874532 25200
> #      Wed Mar 26 15:55:32 2014 -0700
> # Node ID 5d859e949ce867db1439448f6ea2bb3745252768
> # Parent  8c210b169c694f654414a977a59cb5e4c34eb48f
> pull: prevent duplicated entry in `op.pulledsubset`

Queued for default, thanks.

I don't think there's any reason for preserving order here, so perhaps
we can convert all the lists to sets. But let's bikeshed by follow-up
patch.

Patch

diff --git a/mercurial/exchange.py b/mercurial/exchange.py
--- a/mercurial/exchange.py
+++ b/mercurial/exchange.py
@@ -407,11 +407,16 @@  class pulloperation(object):
         """heads of the set of changeset target by the pull"""
         # compute target subset
         if self.heads is None:
             # We pulled every thing possible
             # sync on everything common
-            return self.common + self.rheads
+            c = set(self.common)
+            ret = list(self.common)
+            for n in self.rheads:
+                if n not in c:
+                    ret.append(n)
+            return ret
         else:
             # We pulled a specific subset
             # sync on this subset
             return self.heads