Patchwork D7124: copies: move from a copy on branchpoint to a copy on write approach

login
register
mail settings
Submitter phabricator
Date Oct. 16, 2019, 11:48 p.m.
Message ID <differential-rev-PHID-DREV-5g6ewehyvehqxbxjrfwl-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/42446/
State Superseded
Headers show

Comments

phabricator - Oct. 16, 2019, 11:48 p.m.
marmoute created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Before this changes, any branch points results in a copy of the dictionary containing the
  copy information. This can be very costly for branchy history with few rename
  information. Instead, we take a "copy on write" approach. Copying the input data
  only when we are about to update them.
  
  In practice we where already doing the copying in half of these case (because
  `_chain` makes a copy), so we don't add a significant cost here even in the
  linear case. However the speed up in branchy case is very significant. Here are
  some timing on the pypy repository.
  
  revision: large amount; added files: large amount; rename small amount; c3b14617fbd7 9ba6ab77fd29
  before: ! wall 1.399863 comb 1.400000 user 1.370000 sys 0.030000 (median of 10)
  after:  ! wall 0.766453 comb 0.770000 user 0.750000 sys 0.020000 (median of 11)
  revision: large amount; added files: small amount; rename small amount; c3b14617fbd7 f650a9b140d2
  before: ! wall 1.876748 comb 1.890000 user 1.870000 sys 0.020000 (median of 10)
  after:  ! wall 1.167223 comb 1.170000 user 1.150000 sys 0.020000 (median of 10)
  revision: large amount; added files: large amount; rename large amount; 08ea3258278e d9fa043f30c0
  before: ! wall 0.242457 comb 0.240000 user 0.240000 sys 0.000000 (median of 39)
  after:  ! wall 0.211476 comb 0.210000 user 0.210000 sys 0.000000 (median of 45)
  revision: small amount; added files: large amount; rename large amount; df6f7a526b60 a83dc6a2d56f
  before: ! wall 0.013193 comb 0.020000 user 0.020000 sys 0.000000 (median of 224)
  after:  ! wall 0.013290 comb 0.010000 user 0.010000 sys 0.000000 (median of 222)
  revision: small amount; added files: large amount; rename small amount; 4aa4e1f8e19a 169138063d63
  before: ! wall 0.001673 comb 0.000000 user 0.000000 sys 0.000000 (median of 1000)
  after:  ! wall 0.001677 comb 0.000000 user 0.000000 sys 0.000000 (median of 1000)
  revision: small amount; added files: small amount; rename small amount; 4bc173b045a6 964879152e2e
  before: ! wall 0.000119 comb 0.000000 user 0.000000 sys 0.000000 (median of 8023)
  after:  ! wall 0.000119 comb 0.000000 user 0.000000 sys 0.000000 (median of 7997)
  revision: medium amount; added files: large amount; rename medium amount; c95f1ced15f2 2c68e87c3efe
  before: ! wall 0.201898 comb 0.210000 user 0.200000 sys 0.010000 (median of 48)
  after:  ! wall 0.167415 comb 0.170000 user 0.160000 sys 0.010000 (median of 58)
  revision: medium amount; added files: medium amount; rename small amount; d343da0c55a8 d7746d32bf9d
  before: ! wall 0.036820 comb 0.040000 user 0.040000 sys 0.000000 (median of 100)
  after:  ! wall 0.035797 comb 0.040000 user 0.040000 sys 0.000000 (median of 100)
  
  The extra cost in the linear case can be reclaimed later with some extra logic.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D7124

AFFECTED FILES
  mercurial/copies.py

CHANGE DETAILS




To: marmoute, #hg-reviewers
Cc: mercurial-devel

Patch

diff --git a/mercurial/copies.py b/mercurial/copies.py
--- a/mercurial/copies.py
+++ b/mercurial/copies.py
@@ -268,15 +268,19 @@ 
                 childcopies = {
                     dst: src for dst, src in childcopies.items() if match(dst)
                 }
-            # Copy the dict only if later iterations will also need it
-            if i != len(children[r]) - 1:
-                newcopies = copies.copy()
-            else:
-                newcopies = copies
+            newcopies = copies
             if childcopies:
                 newcopies = _chain(newcopies, childcopies)
+                # _chain makes a copies, we can avoid doing so in some
+                # simple/linear cases.
+                assert newcopies is not copies
             for f in removed:
                 if f in newcopies:
+                    if newcopies is copies:
+                        # copy on write to avoid affecting potential other
+                        # branches.  when there are no other branches, this
+                        # could be avoided.
+                        newcopies = copies.copy()
                     del newcopies[f]
             othercopies = all_copies.get(c)
             if othercopies is None: