Patchwork D6255: copies: calculate mergecopies() based on pathcopies()

login
register
mail settings
Submitter phabricator
Date April 16, 2019, 5:19 p.m.
Message ID <differential-rev-PHID-DREV-dx4bfgotzvzt4pllt6g2-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/39655/
State Superseded
Headers show

Comments

phabricator - April 16, 2019, 5:19 p.m.
martinvonz created this revision.
Herald added subscribers: mercurial-devel, mjpieters.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  When copies are stored in changesets, we need a changeset-centric
  version of mergecopies() just like we have a changeset-centric version
  of pathcopies(). I think the natural way of thinking about
  mergecopies() is in terms of pathcopies() from the base to each of the
  commits. So if we can rewrite mergecopies() based two such
  pathcopies() calls, we'll get the changeset-centric version for
  free. That's what this patch does.
  
  A nice bonus is that it ends up being a lot simpler. mergecopies() has
  accumulated a lot of technical debt over time. One good example is the
  code for dealing with grafts (the "partial/incomplete/dirty"
  stuff). Since pathcopies() already deals with backwards renames and
  ping-pong renames, we get that for free.
  
  I've run tests with hard-coded debug logging for "fullcopy" and while
  I haven't looked at every difference it produces, all the ones I have
  looked at seemed reasonable to me.
  
  One drawback of the rewritten code is that we may now make
  remotefilelog prefetch more files. We used to prefetch files that were
  unique to either side of the merge compared to the other. We now
  prefetch files that are unique to either sise of the merge compared to
  the base. This means that if you added the same file to each side, we
  would not prefetch it before, but we would now. Such cases are
  probably quite rare, but one likely scenario where they happen is when
  moving from a commit to its successor (or the other way around). The
  user will probably already have the files in the cache in such cases,
  so it's probably not a big deal.
  
  Some timings for calculating mergecopies between two revisions (all
  using the common ancestor as base):
  
  In the hg repo:
  4.8 4.8: 0.21s -> 0.21s
  4.0 4.8: 0.35s -> 0.63s
  
  In and old copy of the mozilla-unified repo:
  FIREFOX_BETA_60_BASE^ FIREFOX_BETA_60_BASE: 0.51s -> 0.60s
  FIREFOX_NIGHTLY_59_END FIREFOX_BETA_60_BASE: 2.1s -> 2.3s
  FIREFOX_BETA_59_END FIREFOX_BETA_60_BASE: 3.1s -> 3.3s
  FIREFOX_AURORA_50_BASE FIREFOX_BETA_60_BASE: 30s -> 35s
  
  So it's measurably slower in most cases. Note that merge copies are
  not calculated when updating with a clean working copy, which is
  probably the most common case. I therefore think the much simpler code
  is worth the slowdown.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6255

AFFECTED FILES
  mercurial/copies.py
  tests/test-annotate.t
  tests/test-fastannotate-hg.t
  tests/test-graft.t
  tests/test-rename-merge2.t

CHANGE DETAILS




To: martinvonz, #hg-reviewers
Cc: mjpieters, mercurial-devel
phabricator - April 16, 2019, 7:02 p.m.
marmoute added a comment.


  I did a first path through it, the new code seems reasonable and easier 
  to read than the previous one. Some comments and questions below.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6255

To: martinvonz, #hg-reviewers
Cc: marmoute, mjpieters, mercurial-devel
Pierre-Yves David - April 16, 2019, 8:02 p.m.
On 4/16/19 9:02 PM, marmoute (Pierre-Yves David) wrote:
> marmoute added a comment.
> 
> 
>    I did a first path through it, the new code seems reasonable and easier
>    to read than the previous one. Some comments and questions below.


The actual comments are visible here:

 
https://www.mercurial-scm.org/pipermail/mercurial-devel/2019-April/130321.html
phabricator - April 16, 2019, 8:32 p.m.
martinvonz added a comment.


  In https://phab.mercurial-scm.org/D6255#91019, @martinvonz wrote:
  
  > In https://phab.mercurial-scm.org/D6255#91015, @marmoute wrote:
  >
  > > I did a first path through it, the new code seems reasonable and easier 
  > >  to read than the previous one. Some comments and questions below.
  >
  >
  > The rest somehow didn't make it here, so I'll copy from the email (i.e. the below is from Pierre-Yves, not from me):
  
  
  Ah, the reason for that might be that Phabricator was down at the time of the email.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6255

To: martinvonz, #hg-reviewers
Cc: marmoute, mjpieters, mercurial-devel
phabricator - April 16, 2019, 9:05 p.m.
martinvonz added a comment.


  Replying to just a few things now. Will reply to the rest later.
  
  In https://phab.mercurial-scm.org/D6255#91019, @martinvonz wrote:
  
  > In https://phab.mercurial-scm.org/D6255#91015, @marmoute wrote:
  >
  > > I did a first path through it, the new code seems reasonable and easier 
  > >  to read than the previous one. Some comments and questions below.
  >
  >
  > The rest somehow didn't make it here, so I'll copy from the email (i.e. the below is from Pierre-Yves, not from me):
  >
  > I did a first path through it, the new code seems reasonable and easier 
  >  to read than the previous one. Some comments and questions below.
  
  
  Thanks for reviewing!
  
  >>     I've run tests with hard-coded debug logging for "fullcopy" and while
  >>     I haven't looked at every difference it produces, all the ones I have
  >>     looked at seemed reasonable to me.
  > 
  > How many difference did you had? can you share some example of them with 
  >  their explanation?
  
  Without explanation :), see http://paste.debian.net/1077862/
  
  It just seemed to long to include in the commit message.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6255

To: martinvonz, #hg-reviewers
Cc: marmoute, mjpieters, mercurial-devel
phabricator - April 16, 2019, 9:08 p.m.
martinvonz added a comment.


  In https://phab.mercurial-scm.org/D6255#91019, @martinvonz wrote:
  
  > In https://phab.mercurial-scm.org/D6255#91015, @marmoute wrote:
  >
  > > I did a first path through it, the new code seems reasonable and easier 
  > >  to read than the previous one. Some comments and questions below.
  >
  >
  > The rest somehow didn't make it here, so I'll copy from the email (i.e. the below is from Pierre-Yves, not from me):
  >
  > >     Some timings for calculating mergecopies between two revisions (all
  > >     using the common ancestor as base):
  >
  > Which revisions did you pick?
  
  
  The revisions are those before the colon, so e.g. from hg version 4.0 to 4.8. Oops, the first line there should say "4.8 4.9", not "4.8 4.8". I've fixed that now.
  
  >>     In the hg repo:
  >>     4.8 4.8: 0.21s -> 0.21s
  >>     4.0 4.8: 0.35s -> 0.63s
  >>     
  >>     In and old copy of the mozilla-unified repo:
  >>     FIREFOX_BETA_60_BASE^ FIREFOX_BETA_60_BASE: 0.51s -> 0.60s
  >>     FIREFOX_NIGHTLY_59_END FIREFOX_BETA_60_BASE: 2.1s -> 2.3s
  >>     FIREFOX_BETA_59_END FIREFOX_BETA_60_BASE: 3.1s -> 3.3s
  >>     FIREFOX_AURORA_50_BASE FIREFOX_BETA_60_BASE: 30s -> 35s

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6255

To: martinvonz, #hg-reviewers
Cc: marmoute, mjpieters, mercurial-devel
Pierre-Yves David - April 16, 2019, 9:30 p.m.
On 4/16/19 11:08 PM, martinvonz (Martin von Zweigbergk) wrote:
> martinvonz added a comment.
> 
> 
>    In https://phab.mercurial-scm.org/D6255#91019, @martinvonz wrote:
>    
>    > In https://phab.mercurial-scm.org/D6255#91015, @marmoute wrote:
>    >
>    > > I did a first path through it, the new code seems reasonable and easier
>    > >  to read than the previous one. Some comments and questions below.
>    >
>    >
>    > The rest somehow didn't make it here, so I'll copy from the email (i.e. the below is from Pierre-Yves, not from me):
>    >
>    > >     Some timings for calculating mergecopies between two revisions (all
>    > >     using the common ancestor as base):
>    >
>    > Which revisions did you pick?
>    
>    
>    The revisions are those before the colon, so e.g. from hg version 4.0 to 4.8. Oops, the first line there should say "4.8 4.9", not "4.8 4.8". I've fixed that now.
Can you give that a shot with the two revisions we use in the benchmark 
suite, this is a pair expensive with the current algorithm.

1daa622bbe42 76caed42cf7c
phabricator - April 16, 2019, 10:33 p.m.
martinvonz added a comment.


  From Pierre-Yves:
  
  > Can you give that a shot with the two revisions we use in the benchmark 
  >  suite, this is a pair expensive with the current algorithm.
  > 
  > 1daa622bbe42 76caed42cf7c
  
  Sure, it takes about 29 seconds with or without this patch. It seems the difference is smaller than the noise level.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6255

To: martinvonz, #hg-reviewers
Cc: marmoute, mjpieters, mercurial-devel
phabricator - April 17, 2019, 10:08 p.m.
martinvonz added a comment.


  In https://phab.mercurial-scm.org/D6255#91146, @martinvonz wrote:
  
  > In https://phab.mercurial-scm.org/D6255#91019, @martinvonz wrote:
  >
  > > In https://phab.mercurial-scm.org/D6255#91015, @marmoute wrote:
  > >
  > > >
  > >
  > >
  > > Since this seems the very same code as the previous clause, I wonder if 
  > >  we could factor it out. This would help to prevent subtle bug when we 
  > >  update it. (the answer might be "no because python is slow).
  >
  >
  > Yes, I also considered that. I wasn't sure what a good name for the method would be and I gave up. I'll try again.
  
  
  I've picked a name and done the refactoring now.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6255

To: martinvonz, #hg-reviewers
Cc: marmoute, mjpieters, mercurial-devel
phabricator - April 27, 2019, 7:21 a.m.
martinvonz planned changes to this revision.
martinvonz added a comment.


  I'll spend a bit more time to see if I can figure out why pathcopies() and mergecopies() walk file ancestor differently. The way mergecopies() does it is faster, so I'l see if I can use that for pathcopies() too. https://phab.mercurial-scm.org/D6274:: can still be queued if anyone has time.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6255

To: martinvonz, #hg-reviewers
Cc: marmoute, mjpieters, mercurial-devel
phabricator - April 28, 2019, 7:35 a.m.
martinvonz added a comment.


  In https://phab.mercurial-scm.org/D6255#91851, @martinvonz wrote:
  
  > I'll spend a bit more time to see if I can figure out why pathcopies() and mergecopies() walk file ancestor differently. The way mergecopies() does it is faster, so I'l see if I can use that for pathcopies() too. https://phab.mercurial-scm.org/D6274:: can still be queued if anyone has time.
  
  
  I thought I was done with that after finding some bugs in mergecopies(). I thought fixing those would make mergecopies() as slow as pathcopies(), but that still doesn't seem to explain it :( Maybe I'll spend even more time on this tomorrow.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6255

To: martinvonz, #hg-reviewers
Cc: marmoute, mjpieters, mercurial-devel
phabricator - April 29, 2019, 10:56 p.m.
martinvonz added a comment.


  In https://phab.mercurial-scm.org/D6255#91965, @martinvonz wrote:
  
  > In https://phab.mercurial-scm.org/D6255#91851, @martinvonz wrote:
  >
  > > I'll spend a bit more time to see if I can figure out why pathcopies() and mergecopies() walk file ancestor differently. The way mergecopies() does it is faster, so I'l see if I can use that for pathcopies() too. https://phab.mercurial-scm.org/D6274:: can still be queued if anyone has time.
  >
  >
  > I thought I was done with that after finding some bugs in mergecopies(). I thought fixing those would make mergecopies() as slow as pathcopies(), but that still doesn't seem to explain it :( Maybe I'll spend even more time on this tomorrow.
  
  
  The biggest difference turned out to come from the `isintruducedafter()` that I mentioned earlier. I'd be fine with removing that, but we can discuss that after this patch is landed. I think it's an improvement to make pathcopies() and mergecopies() more consistent anyway.
  
  While investigating differences between pathcopies() and mergecopies(), I noticed some other differences and I've added tests for them. As you can see in this patch, some of them are now fixed.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6255

To: martinvonz, #hg-reviewers
Cc: marmoute, mjpieters, mercurial-devel
phabricator - May 7, 2019, 3:41 a.m.
mharbison72 added a comment.


  This might break `--pure` without `--local` in the annotate tests.  No idea if that's a valid combination, but the buildbots (mostly) use that.  In fairness, it seems that this combination had an error where `_filecommit()` was given too many arguments in the direct ancestors, so maybe the real breakage occurred in there.  But there seems to be extra output here.
  
  https://buildbot.mercurial-scm.org/builders/macOS%2010.12%20hg%20tests/builds/828/steps/pure/logs/stdio
  
  It seems similar to `--pure` without `--local` failing in test-repo-compengines.t recently.
  
  https://www.mercurial-scm.org/pipermail/mercurial-devel/2019-April/130762.html

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6255

To: martinvonz, #hg-reviewers
Cc: mharbison72, marmoute, mjpieters, mercurial-devel
phabricator - May 7, 2019, 4:54 a.m.
martinvonz added a comment.


  In https://phab.mercurial-scm.org/D6255#92356, @mharbison72 wrote:
  
  > This might break `--pure` without `--local` in the annotate tests.  No idea if that's a valid combination, but the buildbots (mostly) use that.  In fairness, it seems that this combination had an error where `_filecommit()` was given too many arguments in the direct ancestors, so maybe the real breakage occurred in there.  But there seems to be extra output here.
  >
  > https://buildbot.mercurial-scm.org/builders/macOS%2010.12%20hg%20tests/builds/828/steps/pure/logs/stdio
  >
  > It seems similar to `--pure` without `--local` failing in test-repo-compengines.t recently.
  >
  > https://www.mercurial-scm.org/pipermail/mercurial-devel/2019-April/130762.html
  
  
  It's apparently the pure file merge code (simplemerge.py, I think) that works differently. The common ancestor's content is:
  
    a
    a
    a
  
  The local side is:
  
    a
    z
    a
  
  The other side is:
  
    a
    a
    a
    b4
    c
    b6
  
  The pure version thinks they conflict and gives result:
  
    a
    z
    a
    <<<<<<< working copy: b80e3e32f75a - test: c
    ||||||| base
    a
    =======
    a
    b4
    c
    b5
    >>>>>>> merge rev:    64afcdf8e29e - test: mergeb
  
  I'll see if I can work around it by changing the contents of the file a bit.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6255

To: martinvonz, #hg-reviewers
Cc: mharbison72, marmoute, mjpieters, mercurial-devel

Patch

diff --git a/tests/test-rename-merge2.t b/tests/test-rename-merge2.t
--- a/tests/test-rename-merge2.t
+++ b/tests/test-rename-merge2.t
@@ -433,6 +433,9 @@ 
   --------------
   test L:nc a b R:up b   W:       - 12 merge b no ancestor
   --------------
+    all copies found (* = to merge, ! = divergent, % = renamed and deleted):
+     src: 'a' -> dst: 'b' 
+    checking for directory renames
   resolving manifests
    branchmerge: True, force: False, partial: False
    ancestor: 924404dff337, local: 86a2aa42fc76+, remote: af30c7647fc7
@@ -469,6 +472,9 @@ 
   --------------
   test L:up b   R:nm a b W:       - 13 merge b no ancestor
   --------------
+    all copies found (* = to merge, ! = divergent, % = renamed and deleted):
+     src: 'a' -> dst: 'b' 
+    checking for directory renames
   resolving manifests
    branchmerge: True, force: False, partial: False
    ancestor: 924404dff337, local: 59318016310c+, remote: bdb19105162a
@@ -506,6 +512,9 @@ 
   --------------
   test L:nc a b R:up a b W:       - 14 merge b no ancestor
   --------------
+    all copies found (* = to merge, ! = divergent, % = renamed and deleted):
+     src: 'a' -> dst: 'b' 
+    checking for directory renames
   resolving manifests
    branchmerge: True, force: False, partial: False
    ancestor: 924404dff337, local: 86a2aa42fc76+, remote: 8dbce441892a
@@ -543,6 +552,9 @@ 
   --------------
   test L:up b   R:nm a b W:       - 15 merge b no ancestor, remove a
   --------------
+    all copies found (* = to merge, ! = divergent, % = renamed and deleted):
+     src: 'a' -> dst: 'b' 
+    checking for directory renames
   resolving manifests
    branchmerge: True, force: False, partial: False
    ancestor: 924404dff337, local: 59318016310c+, remote: bdb19105162a
@@ -580,6 +592,9 @@ 
   --------------
   test L:nc a b R:up a b W:       - 16 get a, merge b no ancestor
   --------------
+    all copies found (* = to merge, ! = divergent, % = renamed and deleted):
+     src: 'a' -> dst: 'b' 
+    checking for directory renames
   resolving manifests
    branchmerge: True, force: False, partial: False
    ancestor: 924404dff337, local: 86a2aa42fc76+, remote: 8dbce441892a
@@ -617,6 +632,9 @@ 
   --------------
   test L:up a b R:nc a b W:       - 17 keep a, merge b no ancestor
   --------------
+    all copies found (* = to merge, ! = divergent, % = renamed and deleted):
+     src: 'a' -> dst: 'b' 
+    checking for directory renames
   resolving manifests
    branchmerge: True, force: False, partial: False
    ancestor: 924404dff337, local: 0b76e65c8289+, remote: 4ce40f5aca24
@@ -653,6 +671,9 @@ 
   --------------
   test L:nm a b R:up a b W:       - 18 merge b no ancestor
   --------------
+    all copies found (* = to merge, ! = divergent, % = renamed and deleted):
+     src: 'a' -> dst: 'b' 
+    checking for directory renames
   resolving manifests
    branchmerge: True, force: False, partial: False
    ancestor: 924404dff337, local: 02963e448370+, remote: 8dbce441892a
@@ -695,6 +716,9 @@ 
   --------------
   test L:up a b R:nm a b W:       - 19 merge b no ancestor, prompt remove a
   --------------
+    all copies found (* = to merge, ! = divergent, % = renamed and deleted):
+     src: 'a' -> dst: 'b' 
+    checking for directory renames
   resolving manifests
    branchmerge: True, force: False, partial: False
    ancestor: 924404dff337, local: 0b76e65c8289+, remote: bdb19105162a
diff --git a/tests/test-graft.t b/tests/test-graft.t
--- a/tests/test-graft.t
+++ b/tests/test-graft.t
@@ -75,6 +75,8 @@ 
 
   $ hg graft -r 2 --base 3
   grafting 2:5c095ad7e90f "2"
+  note: possible conflict - c was deleted and renamed to:
+   a
   note: graft of 2:5c095ad7e90f created no changes to commit
 
 Can't continue without starting:
@@ -220,6 +222,9 @@ 
   committing changelog
   updating the branch cache
   grafting 5:97f8bfe72746 "5"
+    all copies found (* = to merge, ! = divergent, % = renamed and deleted):
+     src: 'c' -> dst: 'b' 
+    checking for directory renames
   resolving manifests
    branchmerge: True, force: True, partial: False
    ancestor: 4c60f11aa304, local: 6b9e5368ca4e+, remote: 97f8bfe72746
@@ -233,6 +238,9 @@ 
   $ HGEDITOR=cat hg graft 4 3 --log --debug
   scanning for duplicate grafts
   grafting 4:9c233e8e184d "4"
+    all copies found (* = to merge, ! = divergent, % = renamed and deleted):
+     src: 'c' -> dst: 'b' 
+    checking for directory renames
   resolving manifests
    branchmerge: True, force: True, partial: False
    ancestor: 4c60f11aa304, local: 1905859650ec+, remote: 9c233e8e184d
@@ -1129,7 +1137,6 @@ 
   grafting 2:f58c7e2b28fa "C0"
   merging f1e and f1b to f1e
   merging f2a and f2c to f2c
-  merging f5b and f5a to f5a
 
 Test the cases A.1 (f4x) and A.7 (f3x).
 
diff --git a/tests/test-fastannotate-hg.t b/tests/test-fastannotate-hg.t
--- a/tests/test-fastannotate-hg.t
+++ b/tests/test-fastannotate-hg.t
@@ -273,37 +273,10 @@ 
   > EOF
   $ hg ci -mc -d '3 0'
   created new head
-BROKEN: 'a' was copied to 'b' on both sides. We should not get a merge conflict here
   $ hg merge
   merging b
-  warning: conflicts while merging b! (edit, then use 'hg resolve --mark')
-  0 files updated, 0 files merged, 0 files removed, 1 files unresolved
-  use 'hg resolve' to retry unresolved file merges or 'hg merge --abort' to abandon
-  [1]
-  $ cat b
-  <<<<<<< working copy: b80e3e32f75a - test: c
-  a
-  z
-  a
-  ||||||| base
-  =======
-  a
-  a
-  a
-  b4
-  c
-  b5
-  >>>>>>> merge rev:    64afcdf8e29e - test: mergeb
-  $ cat <<EOF > b
-  > a
-  > z
-  > a
-  > b4
-  > c
-  > b5
-  > EOF
-  $ hg resolve --mark -q
-  $ rm b.orig
+  0 files updated, 1 files merged, 0 files removed, 0 files unresolved
+  (branch merge, don't forget to commit)
   $ echo d >> b
   $ hg ci -mmerge2 -d '4 0'
 
diff --git a/tests/test-annotate.t b/tests/test-annotate.t
--- a/tests/test-annotate.t
+++ b/tests/test-annotate.t
@@ -273,37 +273,10 @@ 
   > EOF
   $ hg ci -mc -d '3 0'
   created new head
-BROKEN: 'a' was copied to 'b' on both sides. We should not get a merge conflict here
   $ hg merge
   merging b
-  warning: conflicts while merging b! (edit, then use 'hg resolve --mark')
-  0 files updated, 0 files merged, 0 files removed, 1 files unresolved
-  use 'hg resolve' to retry unresolved file merges or 'hg merge --abort' to abandon
-  [1]
-  $ cat b
-  <<<<<<< working copy: b80e3e32f75a - test: c
-  a
-  z
-  a
-  ||||||| base
-  =======
-  a
-  a
-  a
-  b4
-  c
-  b5
-  >>>>>>> merge rev:    64afcdf8e29e - test: mergeb
-  $ cat <<EOF > b
-  > a
-  > z
-  > a
-  > b4
-  > c
-  > b5
-  > EOF
-  $ hg resolve --mark -q
-  $ rm b.orig
+  0 files updated, 1 files merged, 0 files removed, 0 files unresolved
+  (branch merge, don't forget to commit)
   $ echo d >> b
   $ hg ci -mmerge2 -d '4 0'
 
diff --git a/mercurial/copies.py b/mercurial/copies.py
--- a/mercurial/copies.py
+++ b/mercurial/copies.py
@@ -373,57 +373,6 @@ 
 
     return u1, u2
 
-def _makegetfctx(ctx):
-    """return a 'getfctx' function suitable for _checkcopies usage
-
-    We have to re-setup the function building 'filectx' for each
-    '_checkcopies' to ensure the linkrev adjustment is properly setup for
-    each. Linkrev adjustment is important to avoid bug in rename
-    detection. Moreover, having a proper '_ancestrycontext' setup ensures
-    the performance impact of this adjustment is kept limited. Without it,
-    each file could do a full dag traversal making the time complexity of
-    the operation explode (see issue4537).
-
-    This function exists here mostly to limit the impact on stable. Feel
-    free to refactor on default.
-    """
-    rev = ctx.rev()
-    repo = ctx._repo
-    ac = getattr(ctx, '_ancestrycontext', None)
-    if ac is None:
-        revs = [rev]
-        if rev is None:
-            revs = [p.rev() for p in ctx.parents()]
-        ac = repo.changelog.ancestors(revs, inclusive=True)
-        ctx._ancestrycontext = ac
-    def makectx(f, n):
-        if n in node.wdirfilenodeids:  # in a working context?
-            if ctx.rev() is None:
-                return ctx.filectx(f)
-            return repo[None][f]
-        fctx = repo.filectx(f, fileid=n)
-        # setup only needed for filectx not create from a changectx
-        fctx._ancestrycontext = ac
-        fctx._descendantrev = rev
-        return fctx
-    return util.lrucachefunc(makectx)
-
-def _combinecopies(copyfrom, copyto, finalcopy, diverge, incompletediverge):
-    """combine partial copy paths"""
-    remainder = {}
-    for f in copyfrom:
-        if f in copyto:
-            finalcopy[copyto[f]] = copyfrom[f]
-            del copyto[f]
-    for f in incompletediverge:
-        assert f not in diverge
-        ic = incompletediverge[f]
-        if ic[0] in copyto:
-            diverge[f] = [copyto[ic[0]], ic[1]]
-        else:
-            remainder[f] = ic
-    return remainder
-
 def mergecopies(repo, c1, c2, base):
     """
     Finds moves and copies between context c1 and c2 that are relevant for
@@ -526,168 +475,93 @@ 
     This is pretty slow when a lot of changesets are involved but will track all
     the copies.
     """
-    # In certain scenarios (e.g. graft, update or rebase), base can be
-    # overridden We still need to know a real common ancestor in this case We
-    # can't just compute _c1.ancestor(_c2) and compare it to ca, because there
-    # can be multiple common ancestors, e.g. in case of bidmerge.  Because our
-    # caller may not know if the revision passed in lieu of the CA is a genuine
-    # common ancestor or not without explicitly checking it, it's better to
-    # determine that here.
-    #
-    # base.isancestorof(wc) is False, work around that
-    _c1 = c1.p1() if c1.rev() is None else c1
-    _c2 = c2.p1() if c2.rev() is None else c2
-    # an endpoint is "dirty" if it isn't a descendant of the merge base
-    # if we have a dirty endpoint, we need to trigger graft logic, and also
-    # keep track of which endpoint is dirty
-    dirtyc1 = not base.isancestorof(_c1)
-    dirtyc2 = not base.isancestorof(_c2)
-    graft = dirtyc1 or dirtyc2
-    tca = base
-    if graft:
-        tca = _c1.ancestor(_c2)
-
-    limit = _findlimit(repo, c1, c2)
-
     m1 = c1.manifest()
     m2 = c2.manifest()
     mb = base.manifest()
 
-    # gather data from _checkcopies:
-    # - diverge = record all diverges in this dict
-    # - copy = record all non-divergent copies in this dict
-    # - fullcopy = record all copies in this dict
-    # - incomplete = record non-divergent partial copies here
-    # - incompletediverge = record divergent partial copies here
-    diverge = {} # divergence data is shared
-    incompletediverge  = {}
-    data1 = {'copy': {},
-             'fullcopy': {},
-             'incomplete': {},
-             'diverge': diverge,
-             'incompletediverge': incompletediverge,
-            }
-    data2 = {'copy': {},
-             'fullcopy': {},
-             'incomplete': {},
-             'diverge': diverge,
-             'incompletediverge': incompletediverge,
-            }
+    copies1 = pathcopies(base, c1)
+    copies2 = pathcopies(base, c2)
+
+    inversecopies1 = {}
+    inversecopies2 = {}
+    for dst, src in copies1.items():
+        inversecopies1.setdefault(src, []).append(dst)
+    for dst, src in copies2.items():
+        inversecopies2.setdefault(src, []).append(dst)
+
+    copy = {}
+    diverge = {}
+    renamedelete = {}
+    allsources = set(inversecopies1) | set(inversecopies2)
+    for src in allsources:
+        dsts1 = inversecopies1.get(src)
+        dsts2 = inversecopies2.get(src)
+        if dsts1 and dsts2:
+            # copied/renamed on both sides
+            if src not in m1 and src not in m2:
+                # renamed on both sides
+                dsts1 = set(dsts1)
+                dsts2 = set(dsts2)
+                # If there's some overlap in the rename destinations, we
+                # consider it not divergent. For example, if side 1 copies 'a'
+                # to 'b' and 'c' and deletes 'a', and side 2 copies 'a' to 'c'
+                # and 'd' and deletes 'a'.
+                if dsts1 & dsts2:
+                    for dst in (dsts1 & dsts2):
+                        copy[dst] = src
+                else:
+                    diverge[src] = sorted(dsts1 | dsts2)
+            elif src in m1 and src in m2:
+                # copied on both sides
+                dsts1 = set(dsts1)
+                dsts2 = set(dsts2)
+                for dst in (dsts1 & dsts2):
+                    copy[dst] = src
+            # TODO: Handle cases where it was renamed on one side and copied
+            # on the other side
+        elif dsts1:
+            # copied/renamed only on side 1
+            if src not in m2:
+                # deleted on side 2
+                if src not in m1:
+                    # renamed on side 1, deleted on side 2
+                    renamedelete[src] = dsts1
+            elif m2[src] != mb[src]:
+                # modified on side 2
+                for dst in dsts1:
+                    if dst not in m2:
+                        # dst not added on side 2 (handle as regular
+                        # "both created" case in manifestmerge in that case)
+                        copy[dst] = src
+        elif dsts2:
+            # copied/renamed only on side 2
+            if src not in m1:
+                # deleted on side 1
+                if src not in m2:
+                    # renamed on side 2, deleted on side 1
+                    renamedelete[src] = dsts2
+            elif m1[src] != mb[src]:
+                # modified on side 1
+                for dst in dsts2:
+                    if dst not in m1:
+                        # dst not added on side 1 (handle as regular
+                        # "both created" case in manifestmerge in that case)
+                        copy[dst] = src
+
+    renamedeleteset = set()
+    divergeset = set()
+    for src, dsts in diverge.items():
+        divergeset.update(dsts)
+    for src, dsts in renamedelete.items():
+        renamedeleteset.update(dsts)
 
     # find interesting file sets from manifests
     addedinm1 = m1.filesnotin(mb, repo.narrowmatch())
     addedinm2 = m2.filesnotin(mb, repo.narrowmatch())
-    bothnew = sorted(addedinm1 & addedinm2)
-    if tca == base:
-        # unmatched file from base
-        u1r, u2r = _computenonoverlap(repo, c1, c2, addedinm1, addedinm2)
-        u1u, u2u = u1r, u2r
-    else:
-        # unmatched file from base (DAG rotation in the graft case)
-        u1r, u2r = _computenonoverlap(repo, c1, c2, addedinm1, addedinm2)
-        # unmatched file from topological common ancestors (no DAG rotation)
-        # need to recompute this for directory move handling when grafting
-        mta = tca.manifest()
-        u1u, u2u = _computenonoverlap(repo, c1, c2,
-                                      m1.filesnotin(mta, repo.narrowmatch()),
-                                      m2.filesnotin(mta, repo.narrowmatch()),
-                                      debug=False)
-
-    for f in u1u:
-        _checkcopies(c1, c2, f, base, tca, dirtyc1, limit, data1)
-
-    for f in u2u:
-        _checkcopies(c2, c1, f, base, tca, dirtyc2, limit, data2)
-
-    copy = dict(data1['copy'])
-    copy.update(data2['copy'])
-    fullcopy = dict(data1['fullcopy'])
-    fullcopy.update(data2['fullcopy'])
-
-    if dirtyc1:
-        _combinecopies(data2['incomplete'], data1['incomplete'], copy, diverge,
-                       incompletediverge)
-    if dirtyc2:
-        _combinecopies(data1['incomplete'], data2['incomplete'], copy, diverge,
-                       incompletediverge)
-
-    renamedelete = {}
-    renamedeleteset = set()
-    divergeset = set()
-    for of, fl in list(diverge.items()):
-        if len(fl) == 1 or of in c1 or of in c2:
-            del diverge[of] # not actually divergent, or not a rename
-            if of not in c1 and of not in c2:
-                # renamed on one side, deleted on the other side, but filter
-                # out files that have been renamed and then deleted
-                renamedelete[of] = [f for f in fl if f in c1 or f in c2]
-                renamedeleteset.update(fl) # reverse map for below
-        else:
-            divergeset.update(fl) # reverse map for below
+    u1, u2 = _computenonoverlap(repo, c1, c2, addedinm1, addedinm2)
 
-    bothdiverge = {}
-    bothincompletediverge = {}
-    remainder = {}
-    both1 = {'copy': {},
-             'fullcopy': {},
-             'incomplete': {},
-             'diverge': bothdiverge,
-             'incompletediverge': bothincompletediverge
-            }
-    both2 = {'copy': {},
-             'fullcopy': {},
-             'incomplete': {},
-             'diverge': bothdiverge,
-             'incompletediverge': bothincompletediverge
-            }
-    for f in bothnew:
-        _checkcopies(c1, c2, f, base, tca, dirtyc1, limit, both1)
-        _checkcopies(c2, c1, f, base, tca, dirtyc2, limit, both2)
-    if dirtyc1 and dirtyc2:
-        remainder = _combinecopies(both2['incomplete'], both1['incomplete'],
-                                   copy, bothdiverge, bothincompletediverge)
-        remainder1 = _combinecopies(both1['incomplete'], both2['incomplete'],
-                                   copy, bothdiverge, bothincompletediverge)
-        remainder.update(remainder1)
-    elif dirtyc1:
-        # incomplete copies may only be found on the "dirty" side for bothnew
-        assert not both2['incomplete']
-        remainder = _combinecopies({}, both1['incomplete'], copy, bothdiverge,
-                                   bothincompletediverge)
-    elif dirtyc2:
-        assert not both1['incomplete']
-        remainder = _combinecopies({}, both2['incomplete'], copy, bothdiverge,
-                                   bothincompletediverge)
-    else:
-        # incomplete copies and divergences can't happen outside grafts
-        assert not both1['incomplete']
-        assert not both2['incomplete']
-        assert not bothincompletediverge
-    for f in remainder:
-        assert f not in bothdiverge
-        ic = remainder[f]
-        if ic[0] in (m1 if dirtyc1 else m2):
-            # backed-out rename on one side, but watch out for deleted files
-            bothdiverge[f] = ic
-    for of, fl in bothdiverge.items():
-        if len(fl) == 2 and fl[0] == fl[1]:
-            copy[fl[0]] = of # not actually divergent, just matching renames
-
-    # Sometimes we get invalid copies here (the "and not remotebase" in
-    # _checkcopies() seems suspicious). Filter them out.
-    for dst, src in fullcopy.copy().items():
-        if src not in mb:
-            del fullcopy[dst]
-    # Sometimes we forget to add entries from "copy" to "fullcopy", so fix
-    # that up here
-    for dst, src in copy.items():
-        fullcopy[dst] = src
-    # Sometimes we forget to add entries from "diverge" to "fullcopy", so fix
-    # that up here
-    for src, dsts in diverge.items():
-        for dst in dsts:
-            fullcopy[dst] = src
-
+    fullcopy = copies1.copy()
+    fullcopy.update(copies2)
     if not fullcopy:
         return copy, {}, diverge, renamedelete, {}
 
@@ -752,7 +626,7 @@ 
 
     movewithdir = {}
     # check unaccounted nonoverlapping files against directory moves
-    for f in u1r + u2r:
+    for f in u1 + u2:
         if f not in fullcopy:
             for d in dirmove:
                 if f.startswith(d):
@@ -899,99 +773,6 @@ 
     except StopIteration:
         return False
 
-def _checkcopies(srcctx, dstctx, f, base, tca, remotebase, limit, data):
-    """
-    check possible copies of f from msrc to mdst
-
-    srcctx = starting context for f in msrc
-    dstctx = destination context for f in mdst
-    f = the filename to check (as in msrc)
-    base = the changectx used as a merge base
-    tca = topological common ancestor for graft-like scenarios
-    remotebase = True if base is outside tca::srcctx, False otherwise
-    limit = the rev number to not search beyond
-    data = dictionary of dictionary to store copy data. (see mergecopies)
-
-    note: limit is only an optimization, and provides no guarantee that
-    irrelevant revisions will not be visited
-    there is no easy way to make this algorithm stop in a guaranteed way
-    once it "goes behind a certain revision".
-    """
-
-    msrc = srcctx.manifest()
-    mdst = dstctx.manifest()
-    mb = base.manifest()
-    mta = tca.manifest()
-    # Might be true if this call is about finding backward renames,
-    # This happens in the case of grafts because the DAG is then rotated.
-    # If the file exists in both the base and the source, we are not looking
-    # for a rename on the source side, but on the part of the DAG that is
-    # traversed backwards.
-    #
-    # In the case there is both backward and forward renames (before and after
-    # the base) this is more complicated as we must detect a divergence.
-    # We use 'backwards = False' in that case.
-    backwards = not remotebase and base != tca and f in mb
-    getsrcfctx = _makegetfctx(srcctx)
-    getdstfctx = _makegetfctx(dstctx)
-
-    if msrc[f] == mb.get(f) and not remotebase:
-        # Nothing to merge
-        return
-
-    of = None
-    seen = {f}
-    for oc in getsrcfctx(f, msrc[f]).ancestors():
-        of = oc.path()
-        if of in seen:
-            # check limit late - grab last rename before
-            if oc.linkrev() < limit:
-                break
-            continue
-        seen.add(of)
-
-        # remember for dir rename detection
-        if backwards:
-            data['fullcopy'][of] = f # grafting backwards through renames
-        else:
-            data['fullcopy'][f] = of
-        if of not in mdst:
-            continue # no match, keep looking
-        if mdst[of] == mb.get(of):
-            return # no merge needed, quit early
-        c2 = getdstfctx(of, mdst[of])
-        # c2 might be a plain new file on added on destination side that is
-        # unrelated to the droids we are looking for.
-        cr = _related(oc, c2)
-        if cr and (of == f or of == c2.path()): # non-divergent
-            if backwards:
-                data['copy'][of] = f
-            elif of in mb:
-                data['copy'][f] = of
-            elif remotebase: # special case: a <- b <- a -> b "ping-pong" rename
-                data['copy'][of] = f
-                del data['fullcopy'][f]
-                data['fullcopy'][of] = f
-            else: # divergence w.r.t. graft CA on one side of topological CA
-                for sf in seen:
-                    if sf in mb:
-                        assert sf not in data['diverge']
-                        data['diverge'][sf] = [f, of]
-                        break
-            return
-
-    if of in mta:
-        if backwards or remotebase:
-            data['incomplete'][of] = f
-        else:
-            for sf in seen:
-                if sf in mb:
-                    if tca == base:
-                        data['diverge'].setdefault(sf, []).append(f)
-                    else:
-                        data['incompletediverge'][sf] = [of, f]
-                    return
-
 def duplicatecopies(repo, wctx, rev, fromrev, skiprev=None):
     """reproduce copies from fromrev to rev in the dirstate