Patchwork D6420: copies: don't filter out copy targets created on other side of merge commit

mail settings
Submitter phabricator
Date May 22, 2019, 12:32 a.m.
Message ID <>
Download mbox | patch
Permalink /patch/40176/
State New
Headers show


phabricator - May 22, 2019, 12:32 a.m.
martinvonz created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

  If file X is copied to Y on one side of merge and the other side
  creates Y (no copy), we would not mark that as copy. In the
  changeset-centric pathcopies() version, that was done by checking if
  the copy target existed on the other branch. Even though merge commits
  are pretty uncommon, it still turned out to be too expensive to load
  the manifest of the parents of merge commits. In a repo of
  mozilla-unified converted to storing copies in changesets, about 2m30s
  of `hg debugpathcopies FIREFOX_BETA_59_END FIREFOX_BETA_60_BASE` is
  spent on this check of merge commits.
  I tried to think of a way of storing more information in the
  changesets in order to cheaply detect these cases, but I couldn't
  think of a solution. So this patch simply removes those checks.
  For reference, these extra copies are reported from the aforementioned
  command after this patch:
    browser/base/content/sanitize.js -> browser/modules/Sanitizer.jsm
    testing/mozbase/mozprocess/tests/process_normal_finish_python.ini -> testing/mozbase/mozprocess/tests/process_normal_finish.ini
    testing/mozbase/mozprocess/tests/process_waittimeout_python.ini -> testing/mozbase/mozprocess/tests/process_waittimeout.ini
    testing/mozbase/mozprocess/tests/process_waittimeout_10s_python.ini -> testing/mozbase/mozprocess/tests/process_waittimeout_10s.ini
  Since these copies were created on one side of some merge, it still
  seems reasonable to include them, so I'm not even sure it's worse than
  filelog pathcopies(), just different.

  rHG Mercurial




To: martinvonz, #hg-reviewers
Cc: mercurial-devel


diff --git a/tests/test-copies.t b/tests/test-copies.t
--- a/tests/test-copies.t
+++ b/tests/test-copies.t
@@ -452,8 +452,9 @@ 
   y -> z
   $ hg debugpathcopies 1 3
-Create x and y, then rename x to z on one side of merge, and rename y to z and modify z on the
-other side.
+Create x and y, then rename x to z on one side of merge, and rename y to z and
+modify z on the other side. When storing copies in the changeset, we don't
+filter out copies whose target was created on the other side of the merge.
   $ newrepo
   $ echo x > x
   $ echo y > y
@@ -494,12 +495,16 @@ 
   o  0 add x and y
      x y
   $ hg debugpathcopies 1 4
+  y -> z (no-filelog !)
   $ hg debugpathcopies 2 4
+  x -> z (no-filelog !)
   $ hg debugpathcopies 0 4
   x -> z (filelog !)
   y -> z (compatibility !)
   $ hg debugpathcopies 1 5
+  y -> z (no-filelog !)
   $ hg debugpathcopies 2 5
+  x -> z (no-filelog !)
   $ hg debugpathcopies 0 5
   x -> z
diff --git a/mercurial/ b/mercurial/
--- a/mercurial/
+++ b/mercurial/
@@ -276,27 +276,22 @@ 
             # We are tracing copies from both parents
             r, i2, copies2 = heapq.heappop(work)
             copies = {}
-            ctx = repo[r]
-            p1man, p2man = ctx.p1().manifest(), ctx.p2().manifest()
             allcopies = set(copies1) | set(copies2)
             # TODO: perhaps this filtering should be done as long as ctx
             # is merge, whether or not we're tracing from both parent.
             for dst in allcopies:
                 if not match(dst):
-                if dst not in copies2:
-                    # Copied on p1 side: mark as copy from p1 side if it didn't
-                    # already exist on p2 side
-                    if dst not in p2man:
-                        copies[dst] = copies1[dst]
-                elif dst not in copies1:
-                    # Copied on p2 side: mark as copy from p2 side if it didn't
-                    # already exist on p1 side
-                    if dst not in p1man:
-                        copies[dst] = copies2[dst]
+                # Unlike when copies are stored in the filelog, we consider
+                # it a copy even if the destination already existed on the
+                # other branch. It's simply too expensive to check if the
+                # file existed in the manifest.
+                if dst in copies1:
+                    # If it was copied on the p1 side, mark it as copied from
+                    # that side, even if it was also copied on the p2 side.
+                    copies[dst] = copies1[dst]
-                    # Copied on both sides: mark as copy from p1 side
-                    copies[dst] = copies1[dst]
+                    copies[dst] = copies2[dst]
             copies = copies1
         if r == b.rev():