Patchwork manifest: skip fastdelta if the change is large

login
register
mail settings
Submitter Durham Goode
Date Nov. 6, 2015, 3:01 a.m.
Message ID <2B150494-BD93-4CE9-B414-9B58FC45CD32@fb.com>
Download mbox | patch
Permalink /patch/11300/
State Accepted
Headers show

Comments

Durham Goode - Nov. 6, 2015, 3:01 a.m.
On 11/5/15, 6:59 PM, "Mercurial-devel on behalf of Durham Goode" <mercurial-devel-bounces@selenic.com on behalf of durham@fb.com> wrote:



># HG changeset patch

># User Durham Goode <durham@fb.com>

># Date 1446778600 28800

>#      Thu Nov 05 18:56:40 2015 -0800

># Node ID 3b5d30cbbf1132be2325c5362a156038bdc84e2c

># Parent  f9984f76fd90e439221425d751e29bae17bec995

>manifest: skip fastdelta if the change is large

>

>In large repos, the existing manifest fastdelta computation (which performs a

>bisect on the raw manifest for every file that is changing), is excessively

>slow. This patch makes fastdelta fallback to the normal string delta algorithm

>if the number of changes is large.

>

>On a large repo with a commit of 8000 files, this reduces the commit time by 7

>seconds (fastdelta goes from 8 seconds to 1).

>

>I tested this change by modifying the function to compare the old and the new

>values and running the test suite. The only difference is that the pure

>text-diff algorithm sometimes produces smaller (but functionaly identical)

>deltatexts than the bisect algorithm.


The no-whitespace-diff version of this patch is easier to review:

Patch

diff --git a/mercurial/manifest.py b/mercurial/manifest.py

--- a/mercurial/manifest.py

+++ b/mercurial/manifest.py

@@ -334,6 +334,8 @@  class manifestdict(object):

         # zero copy representation of base as a buffer
         addbuf = util.buffer(base)

+        changes = list(changes)

+        if len(changes) < 1000:

         # start with a readonly loop that finds the offset of
         # each line and creates the deltas
         for f, todelete in changes:
@@ -364,6 +366,12 @@  class manifestdict(object):

             delta.append([dstart, dend, "".join(dline)])
         # apply the delta to the base, and get a delta for addrevision
         deltatext, arraytext = _addlistdelta(base, delta)
+        else:

+            # For large changes, it's much cheaper to just build the text and

+            # diff it.

+            arraytext = array.array('c', self.text())

+            deltatext = mdiff.textdiff(base, arraytext)

+

         return arraytext, deltatext

 def _msearch(m, s, lo=0, hi=None):