Patchwork D6419: copies: do full filtering at end of _changesetforwardcopies()

login
register
mail settings
Submitter phabricator
Date May 22, 2019, 12:32 a.m.
Message ID <differential-rev-PHID-DREV-i2hmqka3capfatum4oxf-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/40173/
State Superseded
Headers show

Comments

phabricator - May 22, 2019, 12:32 a.m.
martinvonz created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  As mentioned earlier, pathcopies() is very slow when copies are stored
  in the changeset. Most of the cost comes from calling _chain() for
  every changeset, which is slow because it needs to read manifests. It
  needs to read manifests to be able to filter out copies that are were
  created in one commit and then deleted. (It also filters out copies
  that were created from a file that didn't exist in the starting
  revision, but that's a fixed revision across calls to _chain(), so
  it's much cheaper.)
  
  This patch changes from _chainandfilter() to just _chain() in the main
  loop in _changesetforwardcopies(). It instead removes copies that have
  subsequently been removed by using ctx.filesremoved(). We thus rely on
  that to be fast.
  
  It timed this command in mozilla-unified:
  
    hg debugpathcopies FIREFOX_59_0b3_BUILD2 FIREFOX_BETA_59_END
  
  It took 18s before and 1.1s after. It's still faster when copy
  information is stored in filelogs: 0.70s. It also still gets slow when
  there are merge commits involved, because we read manifests there
  too. We'll deal with that later.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6419

AFFECTED FILES
  mercurial/copies.py

CHANGE DETAILS




To: martinvonz, #hg-reviewers
Cc: mercurial-devel

Patch

diff --git a/mercurial/copies.py b/mercurial/copies.py
--- a/mercurial/copies.py
+++ b/mercurial/copies.py
@@ -300,6 +300,7 @@ 
         else:
             copies = copies1
         if r == b.rev():
+            _filter(a, b, copies)
             return copies
         for c in children[r]:
             childctx = repo[c]
@@ -313,7 +314,10 @@ 
             if not match.always():
                 childcopies = {dst: src for dst, src in childcopies.items()
                                if match(dst)}
-            childcopies = _chainandfilter(a, childctx, copies, childcopies)
+            childcopies = _chain(copies, childcopies)
+            for f in childctx.filesremoved():
+                if f in childcopies:
+                    del childcopies[f]
             heapq.heappush(work, (c, parent, childcopies))
     assert False