Patchwork [4,of,6,V2] similar: do not look up and create filectx more than once

login
register
mail settings
Submitter Yuya Nishihara
Date March 23, 2017, 2:13 p.m.
Message ID <9c622785c2b0a4f4f157.1490278426@mimosa>
Download mbox | patch
Permalink /patch/19602/
State Accepted
Headers show

Comments

Yuya Nishihara - March 23, 2017, 2:13 p.m.
# HG changeset patch
# User Yuya Nishihara <yuya@tcha.org>
# Date 1490271428 -32400
#      Thu Mar 23 21:17:08 2017 +0900
# Node ID 9c622785c2b0a4f4f157dd8ac6a0d0c6816d3725
# Parent  f1b08899545270dea26ac954d21a02ebd5beb5f3
similar: do not look up and create filectx more than once

Benchmark with 50k added/removed files, on tmpfs:

  $ hg addremove --dry-run --time -q

  previous:   real 16.070 secs (user 14.470+0.000 sys 1.580+0.000)
  this patch: real 12.420 secs (user 11.120+0.000 sys 1.280+0.000)

Patch

diff --git a/mercurial/similar.py b/mercurial/similar.py
--- a/mercurial/similar.py
+++ b/mercurial/similar.py
@@ -93,6 +93,9 @@  def _findsimilarmatches(repo, added, rem
         source, bscore = v
         yield source, dest, bscore
 
+def _dropempty(fctxs):
+    return [x for x in fctxs if x.size() > 0]
+
 def findrenames(repo, added, removed, threshold):
     '''find renamed files -- yields (before, after, score) tuples'''
     wctx = repo[None]
@@ -101,10 +104,8 @@  def findrenames(repo, added, removed, th
     # Zero length files will be frequently unrelated to each other, and
     # tracking the deletion/addition of such a file will probably cause more
     # harm than good. We strip them out here to avoid matching them later on.
-    addedfiles = [wctx[fp] for fp in sorted(added)
-                  if wctx[fp].size() > 0]
-    removedfiles = [pctx[fp] for fp in sorted(removed)
-                    if fp in pctx and pctx[fp].size() > 0]
+    addedfiles = _dropempty(wctx[fp] for fp in sorted(added))
+    removedfiles = _dropempty(pctx[fp] for fp in sorted(removed) if fp in pctx)
 
     # Find exact matches.
     matchedfiles = set()