Patchwork [2,of,2] copies._forwardcopies: use set operations to find missing files

login
register
mail settings
Submitter Siddharth Agarwal
Date April 5, 2013, 3:39 a.m.
Message ID <81795c250f6f63cffabb.1365133190@sid0x220>
Download mbox | patch
Permalink /patch/1256/
State Accepted
Commit 3cfaace0441e936826a4a0d4ae12d545f11a903d
Headers show

Comments

Siddharth Agarwal - April 5, 2013, 3:39 a.m.
# HG changeset patch
# User Siddharth Agarwal <sid0@fb.com>
# Date 1365132149 25200
#      Thu Apr 04 20:22:29 2013 -0700
# Node ID 81795c250f6f63cffabbedf0ca0322caaa9c368c
# Parent  d59b8e8a9294d40dfc04e6fe8c295fa05d5a7e40
copies._forwardcopies: use set operations to find missing files

This is a performance win for a number of reasons:
- We don't iterate over contexts, which avoids a completely unnecessary sorted
  call + the O(number of files) abstraction cost of doing that.
- We don't check membership in a context, which avoids another
  O(number of files) abstraction cost.
- We iterate over the manifests in C instead of Python.

For a large repo with 170,000 files, this improves perfpathcopies from 0.34
seconds to 0.07. Anything that uses pathcopies, such as rebase or diff --git
between two revisions, benefits.
Bryan O'Sullivan - April 5, 2013, 3:35 p.m.
On Thu, Apr 4, 2013 at 8:39 PM, Siddharth Agarwal <sid0@fb.com> wrote:

> copies._forwardcopies: use set operations to find missing files
>

Crewed both, thanks.

Patch

diff --git a/mercurial/copies.py b/mercurial/copies.py
--- a/mercurial/copies.py
+++ b/mercurial/copies.py
@@ -133,11 +133,13 @@  def _forwardcopies(a, b):
     # we currently don't try to find where old files went, too expensive
     # this means we can miss a case like 'hg rm b; hg cp a b'
     cm = {}
-    for f in b:
-        if f not in a:
-            ofctx = _tracefile(b[f], a)
-            if ofctx:
-                cm[f] = ofctx.path()
+    missing = set(b.manifest().iterkeys())
+    missing.difference_update(a.manifest().iterkeys())
+
+    for f in missing:
+        ofctx = _tracefile(b[f], a)
+        if ofctx:
+            cm[f] = ofctx.path()
 
     # combine copies from dirstate if necessary
     if w is not None: