Patchwork D9114: copies: make two version of the changeset centric algorithm

login
register
mail settings
Submitter phabricator
Date Sept. 28, 2020, 4:32 p.m.
Message ID <differential-rev-PHID-DREV-wify6gxxfhavrkz4vnne-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/47329/
State Superseded
Headers show

Comments

phabricator - Sept. 28, 2020, 4:32 p.m.
marmoute created this revision.
Herald added a reviewer: hg-reviewers.
Herald added a subscriber: mercurial-patches.

REVISION SUMMARY
  They are two main ways to run the changeset-centric copy-tracing algorithm. One
  fed from data stored in side-data and still in development, and one based on
  data stored in extra (with a "compatibility" mode).
  
  The `extra` based is used in production at Google, but still experimental in
  code. It is mostly unsuitable for other users because it affects the hash.
  
  The side-data based storage and algorithm have been evolving to store more data, cover more cases
  (mostly around merge, that Google do not really care about) and use lower level
  storage for efficiency.
  
  All this changes make is increasingly hard to maintain de common code base,
  without impacting code complexity and performance. For example, the
  compatibility mode requires to keep things at different level than what we
  need for side-data.
  
  So, I am duplicating the involved functions. The newly added `_extra` variants
  will be kept as today, while I will do some deeper rework of the side data
  versions.
  
  Long terms, the side-data version should be more featureful and performant than
  the extra based version, so I expect the duplicated `_extra` functions to
  eventually get dropped.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D9114

AFFECTED FILES
  mercurial/copies.py

CHANGE DETAILS




To: marmoute, #hg-reviewers
Cc: mercurial-patches, mercurial-devel

Patch

diff --git a/mercurial/copies.py b/mercurial/copies.py
--- a/mercurial/copies.py
+++ b/mercurial/copies.py
@@ -309,9 +309,15 @@ 
     iterrevs.update(roots)
     iterrevs.remove(b.rev())
     revs = sorted(iterrevs)
-    return _combine_changeset_copies(
-        revs, children, b.rev(), revinfo, match, isancestor
-    )
+
+    if repo.filecopiesmode == b'changeset-sidedata':
+        return _combine_changeset_copies(
+            revs, children, b.rev(), revinfo, match, isancestor
+        )
+    else:
+        return _combine_changeset_copies_extra(
+            revs, children, b.rev(), revinfo, match, isancestor
+        )
 
 
 def _combine_changeset_copies(
@@ -422,6 +428,98 @@ 
                 minor[dest] = value
 
 
+def _combine_changeset_copies_extra(
+    revs, children, targetrev, revinfo, match, isancestor
+):
+    """version of `_combine_changeset_copies` that works with the Google
+    specific "extra" based storage for copy information"""
+    all_copies = {}
+    alwaysmatch = match.always()
+    for r in revs:
+        copies = all_copies.pop(r, None)
+        if copies is None:
+            # this is a root
+            copies = {}
+        for i, c in enumerate(children[r]):
+            p1, p2, p1copies, p2copies, removed, ismerged = revinfo(c)
+            if r == p1:
+                parent = 1
+                childcopies = p1copies
+            else:
+                assert r == p2
+                parent = 2
+                childcopies = p2copies
+            if not alwaysmatch:
+                childcopies = {
+                    dst: src for dst, src in childcopies.items() if match(dst)
+                }
+            newcopies = copies
+            if childcopies:
+                newcopies = copies.copy()
+                for dest, source in pycompat.iteritems(childcopies):
+                    prev = copies.get(source)
+                    if prev is not None and prev[1] is not None:
+                        source = prev[1]
+                    newcopies[dest] = (c, source)
+                assert newcopies is not copies
+            for f in removed:
+                if f in newcopies:
+                    if newcopies is copies:
+                        # copy on write to avoid affecting potential other
+                        # branches.  when there are no other branches, this
+                        # could be avoided.
+                        newcopies = copies.copy()
+                    newcopies[f] = (c, None)
+            othercopies = all_copies.get(c)
+            if othercopies is None:
+                all_copies[c] = newcopies
+            else:
+                # we are the second parent to work on c, we need to merge our
+                # work with the other.
+                #
+                # In case of conflict, parent 1 take precedence over parent 2.
+                # This is an arbitrary choice made anew when implementing
+                # changeset based copies. It was made without regards with
+                # potential filelog related behavior.
+                if parent == 1:
+                    _merge_copies_dict_extra(
+                        othercopies, newcopies, isancestor, ismerged
+                    )
+                else:
+                    _merge_copies_dict_extra(
+                        newcopies, othercopies, isancestor, ismerged
+                    )
+                    all_copies[c] = newcopies
+
+    final_copies = {}
+    for dest, (tt, source) in all_copies[targetrev].items():
+        if source is not None:
+            final_copies[dest] = source
+    return final_copies
+
+
+def _merge_copies_dict_extra(minor, major, isancestor, ismerged):
+    """version of `_merge_copies_dict` that works with the Google
+    specific "extra" based storage for copy information"""
+    for dest, value in major.items():
+        other = minor.get(dest)
+        if other is None:
+            minor[dest] = value
+        else:
+            new_tt = value[0]
+            other_tt = other[0]
+            if value[1] == other[1]:
+                continue
+            # content from "major" wins, unless it is older
+            # than the branch point or there is a merge
+            if (
+                new_tt == other_tt
+                or not isancestor(new_tt, other_tt)
+                or ismerged(dest)
+            ):
+                minor[dest] = value
+
+
 def _forwardcopies(a, b, base=None, match=None):
     """find {dst@b: src@a} copy mapping where a is an ancestor of b"""