Patchwork amend: fix amending rename commit with obsolescence markers

login
register
mail settings
Submitter Ryan McElroy
Date Oct. 18, 2014, 2:38 a.m.
Message ID <90fe3166419ec665f31c.1413599917@devbig105.prn2.facebook.com>
Download mbox | patch
Permalink /patch/6407/
State Superseded
Headers show

Comments

Ryan McElroy - Oct. 18, 2014, 2:38 a.m.
# HG changeset patch
# User Ryan McElroy <rmcelroy@fb.com>
# Date 1413466506 25200
#      Thu Oct 16 06:35:06 2014 -0700
# Node ID 90fe3166419ec665f31cdc657a8ebe753d7f168d
# Parent  48c0b101a9de1fdbd638daa858da845cd05a6be7
amend: fix amending rename commit with obsolescence markers

This addresses the bug described in issue4405: when obsolescence markers are
enabled, amending a commit with a file move can lead to the copy information
being lost.
Siddharth Agarwal - Oct. 18, 2014, 5:49 p.m.
On 10/17/2014 07:38 PM, Ryan McElroy wrote:
> # HG changeset patch
> # User Ryan McElroy <rmcelroy@fb.com>
> # Date 1413466506 25200
> #      Thu Oct 16 06:35:06 2014 -0700
> # Node ID 90fe3166419ec665f31cdc657a8ebe753d7f168d
> # Parent  48c0b101a9de1fdbd638daa858da845cd05a6be7
> amend: fix amending rename commit with obsolescence markers
>
> This addresses the bug described in issue4405

In that case please add (issue4405) to the end of the title (first line 
of the description).

Since this is a bugfix, please flag it as stable.

> : when obsolescence markers are
> enabled, amending a commit with a file move can lead to the copy information
> being lost.
>
> diff --git a/mercurial/copies.py b/mercurial/copies.py
> --- a/mercurial/copies.py
> +++ b/mercurial/copies.py
> @@ -73,7 +73,32 @@
>   
>       if not hascommonancestor:
>           return None
> -    return limit
> +
> +    # Consider the following flow (tested in test-rename-amend.t):
> +    # 1/ File 'a' committed
> +    # 2/ File renamed from 'a' to 'b' in a new commit
> +    # 3/ Commit messaged updated with amend
> +    # 4/ File renamed from 'b' to 'c' and commit (2) amended
> +    #
> +    # When obsolescence markers are enabled, this would create a graph like:
> +    # @  4 mv b c
> +    # |
> +    # | x  3 temporary amend commit for 60e352014901
> +    # | |
> +    # | x  2 change message
> +    # |/
> +    # | x  1 mv a b
> +    # |/
> +    # o  0 add a
> +    #
> +    # During the second amend (step 4), limit is calculated to be 3, which does
> +    # not look back far enough to reconstruct the full transitive closure of the
> +    # renames.
> +    #
> +    # In this case, a is rev 4 and b is rev 0; taking the minimum of these three
> +    # ensures that we always look back far enough when we have obsolescence
> +    # markers enabled.
> +    return min(limit, a, b)
>   
>   def _chain(src, dst, a, b):
>       '''chain two sets of copies a->b'''
> diff --git a/tests/test-rename-amend.t b/tests/test-rename-amend.t
> new file mode 100644
> --- /dev/null
> +++ b/tests/test-rename-amend.t
> @@ -0,0 +1,37 @@
> +Enable obsolescence markers
> +  $ cat > obs.py << EOF
> +  > import mercurial.obsolete
> +  > mercurial.obsolete._enabled = True
> +  > EOF
> +  $ cat >> $HGRCPATH <<EOF
> +  > [extensions]
> +  > EOF
> +  $ echo "obs=${TESTTMP}/obs.py" >> $HGRCPATH
> +
> +Set up test repo
> +  $ hg init repo
> +  $ cd repo
> +  $ echo a > a
> +  $ hg add a
> +  $ hg commit -m a
> +  $ hg mv a b
> +  $ hg commit -m 'mv a b'
> +  $ hg st -C --change .
> +  A b
> +    a
> +  R a
> +
> +Change the commit message
> +  $ hg commit --amend -m 'change commit message'
> +  $ hg st -C --change .
> +  A b
> +    a
> +  R a
> +
> +Move again
> +  $ hg mv b c
> +  $ hg commit --amend -m 'mv b c'
> +  $ hg st -C --change .
> +  A c
> +    a
> +  R a
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
Matt Mackall - Oct. 18, 2014, 8:49 p.m.
On Fri, 2014-10-17 at 19:38 -0700, Ryan McElroy wrote:
> # HG changeset patch
> # User Ryan McElroy <rmcelroy@fb.com>
> # Date 1413466506 25200
> #      Thu Oct 16 06:35:06 2014 -0700
> # Node ID 90fe3166419ec665f31cdc657a8ebe753d7f168d
> # Parent  48c0b101a9de1fdbd638daa858da845cd05a6be7
> amend: fix amending rename commit with obsolescence markers
> 
> This addresses the bug described in issue4405: when obsolescence markers are
> enabled, amending a commit with a file move can lead to the copy information
> being lost.

..but it's purely a DAG topology issue, as the copies code is completely
unaware of such markers, right? Which means it can affect anywhere this
sort of topology can occur: graft, rebase, histedit, shelve, etc.

(It'd be better to take obsolescence (and our horrible horrible amend
with temporary commits strategy) out of the picture entirely, as it just
makes things more confusing.)

> +    # @  4 mv b c
> +    # |
> +    # | x  3 temporary amend commit for 60e352014901
> +    # | |
> +    # | x  2 change message
> +    # |/
> +    # | x  1 mv a b
> +    # |/
> +    # o  0 add a
> +    #
> +    # During the second amend (step 4), limit is calculated to be 3, 
...
> +    # In this case, a is rev 4 and b is rev 0;
...
> +    return min(limit, a, b)

This doesn't make sense. See the docstring:

    """Find the earliest revision that's an ancestor of a or b but not both,    
    None if no such revision exists.                                            
    """

3 is not an ancestor of a(=4) or b(=0) n your picture, so 3 shouldn't
happen. And min(limit, a, b) is going to be 0, which doesn't comply with
the "not both" clause.

> --- /dev/null
> +++ b/tests/test-rename-amend.t

We now discourage the creation of wholly new test files as they
duplicate large amounts of boilerplate and bloat the already-slow test
suite. Better to just wedge a copy that breaks into an existing amend
test.
Pierre-Yves David - Oct. 18, 2014, 9:33 p.m.
On 10/18/2014 01:49 PM, Matt Mackall wrote:
> On Fri, 2014-10-17 at 19:38 -0700, Ryan McElroy wrote:
>> # HG changeset patch
>> # User Ryan McElroy <rmcelroy@fb.com>
>> # Date 1413466506 25200
>> #      Thu Oct 16 06:35:06 2014 -0700
>> # Node ID 90fe3166419ec665f31cdc657a8ebe753d7f168d
>> # Parent  48c0b101a9de1fdbd638daa858da845cd05a6be7
>> amend: fix amending rename commit with obsolescence markers
>>
>> This addresses the bug described in issue4405: when obsolescence markers are
>> enabled, amending a commit with a file move can lead to the copy information
>> being lost.
>
> ..but it's purely a DAG topology issue, as the copies code is completely
> unaware of such markers, right? Which means it can affect anywhere this
> sort of topology can occur: graft, rebase, histedit, shelve, etc.
>
> (It'd be better to take obsolescence (and our horrible horrible amend
> with temporary commits strategy) out of the picture entirely, as it just
> makes things more confusing.)

The obssmarker version is making this issue appears because of its lack 
of stripping. You raise an interesting point here that once we get rid 
of this (horrible horrible) temporary commit, this test will be be a 
valid test for this issue.

+1 for recreating the test without obsolescence involved.

>> +    # @  4 mv b c
>> +    # |
>> +    # | x  3 temporary amend commit for 60e352014901
>> +    # | |
>> +    # | x  2 change message
>> +    # |/
>> +    # | x  1 mv a b
>> +    # |/
>> +    # o  0 add a
>> +    #
>> +    # During the second amend (step 4), limit is calculated to be 3,
> ...
>> +    # In this case, a is rev 4 and b is rev 0;
> ...
>> +    return min(limit, a, b)
>
> This doesn't make sense. See the docstring:
>
>      """Find the earliest revision that's an ancestor of a or b but not both,
>      None if no such revision exists.
>      """

The wrong thing here is the docstring. We could summarize the docstring 
of this function as: "return the right value to bound tracecopy lookup"

> 3 is not an ancestor of a(=4) or b(=0) n your picture, so 3 shouldn't
> happen. And min(limit, a, b) is going to be 0, which doesn't comply with
> the "not both" clause.

As far as I understand this ancestors based logic is made necessary by 
case described in this test case.

http://selenic.com/hg/file/19f5273c9f3e/tests/test-mv-cp-st-diff.t#l1572

This ancestors based logic fails short (pun intended) in the case 
described by Ryan (when the filelog graph and changeset graph have 
different topology).

This mean call get us to we properly search down to the base (param "a") 
in all case and ensure we search beyond that when necessary 
(test-mv-cp-st-diff case).


To conclude I believe this change is correct and that the docstring need 
updating.
Ryan McElroy - Oct. 20, 2014, 3:31 p.m.
I don't have an intuition on how to reproduce this without obsolescence markers -- in order to make the changelog and filelog manifests have different topologies, the stripping needs to be avoided, which requires obsolescence, right? I'm happy to be wrong here, and I'll give it more thought to see if I can figure out where this would reproduce otherwise.

I'll update the docstring and move the test in the meantime.

~Ryan 

-----Original Message-----
From: Pierre-Yves David [mailto:pierre-yves.david@ens-lyon.org] 
Sent: Saturday, October 18, 2014 2:33 PM
To: Matt Mackall; Ryan McElroy
Cc: mercurial-devel@selenic.com
Subject: Re: [PATCH] amend: fix amending rename commit with obsolescence markers



On 10/18/2014 01:49 PM, Matt Mackall wrote:
> On Fri, 2014-10-17 at 19:38 -0700, Ryan McElroy wrote:
>> # HG changeset patch
>> # User Ryan McElroy <rmcelroy@fb.com> # Date 1413466506 25200
>> #      Thu Oct 16 06:35:06 2014 -0700
>> # Node ID 90fe3166419ec665f31cdc657a8ebe753d7f168d
>> # Parent  48c0b101a9de1fdbd638daa858da845cd05a6be7
>> amend: fix amending rename commit with obsolescence markers
>>
>> This addresses the bug described in issue4405: when obsolescence 
>> markers are enabled, amending a commit with a file move can lead to 
>> the copy information being lost.
>
> ..but it's purely a DAG topology issue, as the copies code is 
> completely unaware of such markers, right? Which means it can affect 
> anywhere this sort of topology can occur: graft, rebase, histedit, shelve, etc.
>
> (It'd be better to take obsolescence (and our horrible horrible amend 
> with temporary commits strategy) out of the picture entirely, as it 
> just makes things more confusing.)

The obssmarker version is making this issue appears because of its lack of stripping. You raise an interesting point here that once we get rid of this (horrible horrible) temporary commit, this test will be be a valid test for this issue.

+1 for recreating the test without obsolescence involved.

>> +    # @  4 mv b c
>> +    # |
>> +    # | x  3 temporary amend commit for 60e352014901
>> +    # | |
>> +    # | x  2 change message
>> +    # |/
>> +    # | x  1 mv a b
>> +    # |/
>> +    # o  0 add a
>> +    #
>> +    # During the second amend (step 4), limit is calculated to be 3,
> ...
>> +    # In this case, a is rev 4 and b is rev 0;
> ...
>> +    return min(limit, a, b)
>
> This doesn't make sense. See the docstring:
>
>      """Find the earliest revision that's an ancestor of a or b but not both,
>      None if no such revision exists.
>      """

The wrong thing here is the docstring. We could summarize the docstring of this function as: "return the right value to bound tracecopy lookup"

> 3 is not an ancestor of a(=4) or b(=0) n your picture, so 3 shouldn't 
> happen. And min(limit, a, b) is going to be 0, which doesn't comply 
> with the "not both" clause.

As far as I understand this ancestors based logic is made necessary by case described in this test case.

http://selenic.com/hg/file/19f5273c9f3e/tests/test-mv-cp-st-diff.t#l1572

This ancestors based logic fails short (pun intended) in the case described by Ryan (when the filelog graph and changeset graph have different topology).

This mean call get us to we properly search down to the base (param "a") in all case and ensure we search beyond that when necessary (test-mv-cp-st-diff case).


To conclude I believe this change is correct and that the docstring need updating.

--
Pierre-Yves David
Pierre-Yves David - Oct. 20, 2014, 5:51 p.m.
On 10/20/2014 08:31 AM, Ryan McElroy wrote:
> I don't have an intuition on how to reproduce this without obsolescence markers -- in order to make the changelog and filelog manifests have different topologies, the stripping needs to be avoided, which requires obsolescence, right? I'm happy to be wrong here, and I'll give it more thought to see if I can figure out where this would reproduce otherwise.

You can skip the first amend and use revert to achieve the same 
changelog//filelog topology divergence:

   $ hg init repo
   $ cd repo
   $ touch a0
   $ hg add a0
   $ hg commit -m a0
   $ hg mv a0 a1
   $ hg commit -m a1
   $ hg up 0
   1 files updated, 0 files merged, 1 files removed, 0 files unresolved
   $ hg revert --rev 'desc(a1)' --all
   removing a0
   adding a1
   $ hg ci -m 'a1-msg'
   created new head
   $ hg mv a1 a2
   $ hg status --copies --rev 'desc(a0)'
   A a2
     a0
   R a0
   $ hg ci --amend
   saved backup bundle to 
$TESTTMP/repo/.hg/strip-backup/ce7fa13c3beb-amend-backup.hg
   $ hg status --copies --rev 'desc(a0)'
   A a2
     a0
   R a0


> I'll update the docstring and move the test in the meantime.

resend was not really necessary since we still have the obsolescence 
bits to solve.

Patch

diff --git a/mercurial/copies.py b/mercurial/copies.py
--- a/mercurial/copies.py
+++ b/mercurial/copies.py
@@ -73,7 +73,32 @@ 
 
     if not hascommonancestor:
         return None
-    return limit
+
+    # Consider the following flow (tested in test-rename-amend.t):
+    # 1/ File 'a' committed
+    # 2/ File renamed from 'a' to 'b' in a new commit
+    # 3/ Commit messaged updated with amend
+    # 4/ File renamed from 'b' to 'c' and commit (2) amended
+    #
+    # When obsolescence markers are enabled, this would create a graph like:
+    # @  4 mv b c
+    # |
+    # | x  3 temporary amend commit for 60e352014901
+    # | |
+    # | x  2 change message
+    # |/
+    # | x  1 mv a b
+    # |/
+    # o  0 add a
+    #
+    # During the second amend (step 4), limit is calculated to be 3, which does
+    # not look back far enough to reconstruct the full transitive closure of the
+    # renames.
+    #
+    # In this case, a is rev 4 and b is rev 0; taking the minimum of these three
+    # ensures that we always look back far enough when we have obsolescence
+    # markers enabled.
+    return min(limit, a, b)
 
 def _chain(src, dst, a, b):
     '''chain two sets of copies a->b'''
diff --git a/tests/test-rename-amend.t b/tests/test-rename-amend.t
new file mode 100644
--- /dev/null
+++ b/tests/test-rename-amend.t
@@ -0,0 +1,37 @@ 
+Enable obsolescence markers
+  $ cat > obs.py << EOF
+  > import mercurial.obsolete
+  > mercurial.obsolete._enabled = True
+  > EOF
+  $ cat >> $HGRCPATH <<EOF
+  > [extensions]
+  > EOF
+  $ echo "obs=${TESTTMP}/obs.py" >> $HGRCPATH
+
+Set up test repo
+  $ hg init repo
+  $ cd repo
+  $ echo a > a
+  $ hg add a
+  $ hg commit -m a
+  $ hg mv a b
+  $ hg commit -m 'mv a b'
+  $ hg st -C --change .
+  A b
+    a
+  R a
+
+Change the commit message
+  $ hg commit --amend -m 'change commit message'
+  $ hg st -C --change .
+  A b
+    a
+  R a
+
+Move again
+  $ hg mv b c
+  $ hg commit --amend -m 'mv b c'
+  $ hg st -C --change .
+  A c
+    a
+  R a