Patchwork [3,of,3] copies: do not track backward copies, only renames (issue3739)

login
register
mail settings
Submitter Siddharth Agarwal
Date Dec. 21, 2012, 9:08 p.m.
Message ID <af618ad15aa529d53062.1356124103@sid0x220>
Download mbox | patch
Permalink /patch/264/
State Accepted
Commit f23dea2b296e0ec2845fc27746dad6341fa2e338
Headers show

Comments

Siddharth Agarwal - Dec. 21, 2012, 9:08 p.m.
# HG changeset patch
# User Siddharth Agarwal <sid0 at fb.com>
# Date 1356123985 28800
# Node ID af618ad15aa529d53062f7cf73d45f539f72ebfd
# Parent  554d6cba20701d6b44870a79389cc89812173aed
copies: do not track backward copies, only renames (issue3739)

The inverse of a rename is a rename back, but the inverse of a copy is a
remove, not a "copy back". Removes are already tracked separately, which means
that returning inverted copies serves no useful purpose. Indeed, it can and
does cause bugs because other code starts believing the fake inverted copies
are actually real.

The only test whose output changes is test-mv-cp-st-diff.t. When a backwards
status -C command is run where a copy is involved, the inverted version of the
copy is no longer displayed (though the remove remains intact). Inverted
copies don't make sense anyway, so the point is moot.
Matt Mackall - Dec. 21, 2012, 11:42 p.m.
On Fri, 2012-12-21 at 13:08 -0800, Siddharth Agarwal wrote:
> # HG changeset patch
> # User Siddharth Agarwal <sid0 at fb.com>
> # Date 1356123985 28800
> # Node ID af618ad15aa529d53062f7cf73d45f539f72ebfd
> # Parent  554d6cba20701d6b44870a79389cc89812173aed
> copies: do not track backward copies, only renames (issue3739)
> 
> The inverse of a rename is a rename back, but the inverse of a copy is a
> remove, not a "copy back".

Strictly speaking, it's not this simple. Insofar as a rename is "simply"
a copy+remove, there is a real sense that a copy does have an inverse. 

For instance, consider this scenario:

changeset 1:
 create a
changeset 2:
 copy a to b
changeset 3:
 delete a

Now we can clearly say what the following diffs should look like:

 1->2: show a copy
 2->3: show a delete
 1->3 = 1->2 + 2->3: show a rename

And we can even say what some of the reverse diffs should look like:

 3->1: show a rename(!)
 3->2: show a file add

But what should 2->1 look like? If it doesn't mention some kind of
relation between a and b, we've lost information relative to the the
forward diffs, nor can we construct 3->2 + 2->1 = 3->1. So we've got
some sort of relation here (let's call it a 'ypoc') such that 'add' +
'ypoc' = 'rename' in the same way that 'copy' + 'remove' = 'rename'. 

And as it happens, ypocs are actually a real thing in Mercurial, though
we admittedly don't handle them very well, nor do we have any way to
display them. A ypoc exists in the following not entirely uncommon
situation:

1-2-4
 \ /
  3

1: create a
2: copy a to b
3: modify a
4: merge

Here, merge should merge the changes of a into b and record both a and b
as ancestors for the result[1], giving us an 2:1 relationship that's the
inverse of a typical 1:n copy. If we then say "we don't really want a"
and commit that removal with the merge, we've got a genuine ypoc. 

In other words, a ypoc is a cross-file merge where the 'other' file
doesn't exist afterwards. And that actually suggests that copy itself
not really the fundamental relationship, because you can have all of the
following topologies:

copy   ypoc   rename  rename  ???   ???

 a-a   a-a    a          a    a-a   a-a
  \     /      \        /      /     \
   b   b        b      b      b-b   b-b

(the ??? case comes up naturally in merges involving copies as we've
seen above)

So really, what we've got is a 'cross-filename inheritance relation' and
depending on the topology, we can think of it as a copy, rename, ypoc,
etc.

In theory, pathcopies actually conceptually cares about this. Graft and
rebase care about a topology that looks like this:

 o-o-o-o-x-o-o-o-o-a
          \
           o-o-b

where pathcopies(a,b) = pathcopies(a,x) + pathcopies(x,b). If there's a
'add' on the a->x path (aka a removal on x->a) and a 'ypoc' on the x->b
path, then we may have the equivalent of a rename on the joined path.

But again, we handle these things very badly at present, which is not
surprising given how mind-warping it is:

 - we can't display ypocs or ???s usefully
   (though they matter for annotate!)
 - we probably don't/can't record some of these cases properly
 - pathcopies and mergecopies are not capable of representing them
 - we get confused by them when doing graft/rebase

As you've seen, pathcopies uses a dict, so it can only represent 1:n
relationships, and not n:1 ones, nor n:m, which can also exist!

But as you've also seen, we actually want to get some of this sort of
information out of at least mergecopies, hence the multiple dictionaries
that are returned.

For now, you're probably right that whatever the correct and complete
semantics should be, what _backwardsrename is doing is wrong-ish as a
ypoc is definitely not a copy, nor do we say we're returning 'cross-file
inheritance relations'.

You also may have discovered that mergecopies does not yet properly
handle the idea that the given ancestor may not actually be a
topological ancestor (as needed by rebase/graft).

At some point, I want to rationalize things so that mergecopies is
defined in terms of pathcopies so that it handles backwards legs, and
maybe eventually have a well-defined theory for what should happen with
the reverse-copy-like cases beyond 'punt'.

All of this is a long way of saying that I don't think your commit
comment is correct.

[1] See
http://www.selenic.com/hg/file/98687cdddcb1/mercurial/localrepo.py#l1026
Siddharth Agarwal - Dec. 22, 2012, 3:50 a.m.
On 12/21/2012 03:42 PM, Matt Mackall wrote:
> All of this is a long way of saying that I don't think your commit 
> comment is correct. 

Thanks for the correction. I agree that "ypoc" is something we should 
eventually handle, perhaps with a richer data structure that supports 
the full range of possibilities. What do you think of keeping the code 
the same, but updating the commit message to:

copies: do not track backward copies, only renames (issue3739)

The inverse of a rename is a rename, but the inverse of a copy is not a 
copy.
Presenting it as such -- in particular, stuffing it into the same dict 
as real
copies -- causes bugs because other code starts believing the inverse copies
are real.

The only test whose output changes is test-mv-cp-st-diff.t. When a backwards
status -C command is run where a copy is involved, the inverse copy 
(which was
hitherto presented as a real copy) is no longer displayed.

Keeping track of inverse copies is useful in some situations -- 
composability
of diffs, for example, since a delete of "b" followed by an inverse copy "b"
to "a" is equivalent to a rename "b" to "a". However, representing them 
would
require a more complex data structure than the same dict in which real 
copies
are also stored.
Siddharth Agarwal - Dec. 22, 2012, 5:02 a.m.
On 12/21/2012 07:50 PM, Siddharth Agarwal wrote:
> Keeping track of inverse copies is useful in some situations -- 
> composability
> of diffs, for example, since a delete of "b" followed by an inverse 
> copy "b"

A delete of "a", not of "b".
Matt Mackall - Dec. 22, 2012, 7:39 p.m.
On Fri, 2012-12-21 at 19:50 -0800, Siddharth Agarwal wrote:
> On 12/21/2012 03:42 PM, Matt Mackall wrote:
> > All of this is a long way of saying that I don't think your commit 
> > comment is correct. 
> 
> Thanks for the correction. I agree that "ypoc" is something we should 
> eventually handle, perhaps with a richer data structure that supports 
> the full range of possibilities. What do you think of keeping the code 
> the same, but updating the commit message to:
> 
> copies: do not track backward copies, only renames (issue3739)
> 
> The inverse of a rename is a rename, but the inverse of a copy is not a 
> copy.
> Presenting it as such -- in particular, stuffing it into the same dict 
> as real
> copies -- causes bugs because other code starts believing the inverse copies
> are real.
> 
> The only test whose output changes is test-mv-cp-st-diff.t. When a backwards
> status -C command is run where a copy is involved, the inverse copy 
> (which was
> hitherto presented as a real copy) is no longer displayed.
> 
> Keeping track of inverse copies is useful in some situations -- 
> composability
> of diffs, for example, since a delete of "b" followed by an inverse copy "b"
> to "a" is equivalent to a rename "b" to "a". However, representing them 
> would
> require a more complex data structure than the same dict in which real 
> copies
> are also stored.

Seems fine.

Patch

diff --git a/mercurial/copies.py b/mercurial/copies.py
--- a/mercurial/copies.py
+++ b/mercurial/copies.py
@@ -145,12 +145,16 @@  def _forwardcopies(a, b):
 
     return cm
 
-def _backwardcopies(a, b):
-    # because the forward mapping is 1:n, we can lose renames here
-    # in particular, we find renames better than copies
+def _backwardrenames(a, b):
+    # Even though we're not taking copies into account, 1:n rename situations
+    # can still exist (e.g. hg cp a b; hg mv a c). In those cases we
+    # arbitrarily pick one of the renames.
     f = _forwardcopies(b, a)
     r = {}
     for k, v in f.iteritems():
+        # remove copies
+        if v in a:
+            continue
         r[v] = k
     return r
 
@@ -162,8 +166,8 @@  def pathcopies(x, y):
     if a == x:
         return _forwardcopies(x, y)
     if a == y:
-        return _backwardcopies(x, y)
-    return _chain(x, y, _backwardcopies(x, a), _forwardcopies(a, y))
+        return _backwardrenames(x, y)
+    return _chain(x, y, _backwardrenames(x, a), _forwardcopies(a, y))
 
 def mergecopies(repo, c1, c2, ca):
     """
diff --git a/tests/test-mv-cp-st-diff.t b/tests/test-mv-cp-st-diff.t
--- a/tests/test-mv-cp-st-diff.t
+++ b/tests/test-mv-cp-st-diff.t
@@ -670,7 +670,6 @@  single copy
   
   % hg st -C --rev . --rev 0
   M a
-    b
   R b
   
   % hg diff --git --rev . --rev 0
@@ -728,7 +727,6 @@  single copy
   
   % hg st -C --rev . --rev 2
   M a
-    b
   A x/y
   R b
   
@@ -1072,7 +1070,6 @@  copy chain
   
   % hg st -C --rev . --rev 0
   M a
-    b
   R b
   R c
   
@@ -1148,7 +1145,6 @@  copy chain
   
   % hg st -C --rev . --rev 2
   M a
-    b
   A x/y
   R b
   R c
diff --git a/tests/test-rebase-rename.t b/tests/test-rebase-rename.t
--- a/tests/test-rebase-rename.t
+++ b/tests/test-rebase-rename.t
@@ -20,7 +20,10 @@ 
   $ hg ci -Am B
   adding b
 
-  $ hg up -q -C 0
+  $ hg mv b b-renamed
+  $ hg ci -m 'rename B'
+
+  $ hg up -q -C 1
 
   $ hg mv a a-renamed
 
@@ -28,28 +31,32 @@ 
   created new head
 
   $ hg tglog
-  @  2: 'rename A'
+  @  3: 'rename A'
   |
-  | o  1: 'B'
+  | o  2: 'rename B'
   |/
+  o  1: 'B'
+  |
   o  0: 'A'
   
 
 Rename is tracked:
 
   $ hg tlog -p --git -r tip
-  2: 'rename A' 
+  3: 'rename A' 
   diff --git a/a b/a-renamed
   rename from a
   rename to a-renamed
   
 Rebase the revision containing the rename:
 
-  $ hg rebase -s 2 -d 1
+  $ hg rebase -s 3 -d 2
   saved backup bundle to $TESTTMP/a/.hg/strip-backup/*-backup.hg (glob)
 
   $ hg tglog
-  @  2: 'rename A'
+  @  3: 'rename A'
+  |
+  o  2: 'rename B'
   |
   o  1: 'B'
   |
@@ -59,11 +66,32 @@  Rebase the revision containing the renam
 Rename is not lost:
 
   $ hg tlog -p --git -r tip
-  2: 'rename A' 
+  3: 'rename A' 
   diff --git a/a b/a-renamed
   rename from a
   rename to a-renamed
   
+
+Rebased revision does not contain information about b (issue3739)
+
+  $ hg log -r 3 --debug
+  changeset:   3:3b905b1064f14ace3ad02353b79dd42d32981655
+  tag:         tip
+  phase:       draft
+  parent:      2:920a371a5635af23a26a011ca346cecd1cfcb942
+  parent:      -1:0000000000000000000000000000000000000000
+  manifest:    3:c4a62b2b64593c8fe0523d4c1ba2e243a8bd4dce
+  user:        test
+  date:        Thu Jan 01 00:00:00 1970 +0000
+  files+:      a-renamed
+  files-:      a
+  extra:       branch=default
+  extra:       rebase_source=89af05cb38a281f891c6f5581dd027092da29166
+  description:
+  rename A
+  
+  
+
   $ cd ..
 
 
@@ -78,47 +106,75 @@  Rename is not lost:
   $ hg ci -Am B
   adding b
 
-  $ hg up -q -C 0
+  $ hg cp b b-copied
+  $ hg ci -Am 'copy B'
+
+  $ hg up -q -C 1
 
   $ hg cp a a-copied
   $ hg ci -m 'copy A'
   created new head
 
   $ hg tglog
-  @  2: 'copy A'
+  @  3: 'copy A'
   |
-  | o  1: 'B'
+  | o  2: 'copy B'
   |/
+  o  1: 'B'
+  |
   o  0: 'A'
   
 Copy is tracked:
 
   $ hg tlog -p --git -r tip
-  2: 'copy A' 
+  3: 'copy A' 
   diff --git a/a b/a-copied
   copy from a
   copy to a-copied
   
 Rebase the revision containing the copy:
 
-  $ hg rebase -s 2 -d 1
+  $ hg rebase -s 3 -d 2
   saved backup bundle to $TESTTMP/b/.hg/strip-backup/*-backup.hg (glob)
 
   $ hg tglog
-  @  2: 'copy A'
+  @  3: 'copy A'
+  |
+  o  2: 'copy B'
   |
   o  1: 'B'
   |
   o  0: 'A'
   
+
 Copy is not lost:
 
   $ hg tlog -p --git -r tip
-  2: 'copy A' 
+  3: 'copy A' 
   diff --git a/a b/a-copied
   copy from a
   copy to a-copied
   
+
+Rebased revision does not contain information about b (issue3739)
+
+  $ hg log -r 3 --debug
+  changeset:   3:98f6e6dbf45ab54079c2237fbd11066a5c41a11d
+  tag:         tip
+  phase:       draft
+  parent:      2:39e588434882ff77d01229d169cdc77f29e8855e
+  parent:      -1:0000000000000000000000000000000000000000
+  manifest:    3:2232f329d66fffe3930d43479ae624f66322b04d
+  user:        test
+  date:        Thu Jan 01 00:00:00 1970 +0000
+  files+:      a-copied
+  extra:       branch=default
+  extra:       rebase_source=0a8162ff18a8900df8df8ef7ac0046955205613e
+  description:
+  copy A
+  
+  
+
   $ cd ..