Patchwork [2,of,6,V2] similar: get rid of quadratic addedfiles.remove()

login
register
mail settings
Submitter Yuya Nishihara
Date March 23, 2017, 2:13 p.m.
Message ID <8b33a51771425d2623aa.1490278424@mimosa>
Download mbox | patch
Permalink /patch/19600/
State Accepted
Headers show

Comments

Yuya Nishihara - March 23, 2017, 2:13 p.m.
# HG changeset patch
# User Yuya Nishihara <yuya@tcha.org>
# Date 1490269833 -32400
#      Thu Mar 23 20:50:33 2017 +0900
# Node ID 8b33a51771425d2623aa37dbcff05cdb5cc8e2fa
# Parent  146e03f2e50d8fe4ef85eecbacb495724a6d0664
similar: get rid of quadratic addedfiles.remove()

Instead, build a set of files to be removed and recreate addedfiles
only if necessary.

Benchmark with 50k added/removed files, on tmpfs:

  $ hg addremove --dry-run --time -q

  original:   real 16.550 secs (user 15.000+0.000 sys 1.540+0.000)
  previous:   real 16.730 secs (user 15.280+0.000 sys 1.440+0.000)
  this patch: real 16.070 secs (user 14.470+0.000 sys 1.580+0.000)

Patch

diff --git a/mercurial/similar.py b/mercurial/similar.py
--- a/mercurial/similar.py
+++ b/mercurial/similar.py
@@ -107,12 +107,14 @@  def findrenames(repo, added, removed, th
                     if fp in parentctx and parentctx[fp].size() > 0]
 
     # Find exact matches.
-    for (a, b) in _findexactmatches(repo, addedfiles[:], removedfiles):
-        addedfiles.remove(b)
+    matchedfiles = set()
+    for (a, b) in _findexactmatches(repo, addedfiles, removedfiles):
+        matchedfiles.add(b)
         yield (a.path(), b.path(), 1.0)
 
     # If the user requested similar files to be matched, search for them also.
     if threshold < 1.0:
+        addedfiles = [x for x in addedfiles if x not in matchedfiles]
         for (a, b, score) in _findsimilarmatches(repo, addedfiles,
                                                  removedfiles, threshold):
             yield (a.path(), b.path(), score)