Patchwork manifest: skip fastdelta if the change is large

login
register
mail settings
Submitter Durham Goode
Date Nov. 6, 2015, 2:59 a.m.
Message ID <3b5d30cbbf1132be2325.1446778786@dev8060.prn1.facebook.com>
Download mbox | patch
Permalink /patch/11299/
State Superseded
Commit 1cbf144fd8a170b2a5e2a7f68357368c5793d7a4
Headers show

Comments

Durham Goode - Nov. 6, 2015, 2:59 a.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1446778600 28800
#      Thu Nov 05 18:56:40 2015 -0800
# Node ID 3b5d30cbbf1132be2325c5362a156038bdc84e2c
# Parent  f9984f76fd90e439221425d751e29bae17bec995
manifest: skip fastdelta if the change is large

In large repos, the existing manifest fastdelta computation (which performs a
bisect on the raw manifest for every file that is changing), is excessively
slow. This patch makes fastdelta fallback to the normal string delta algorithm
if the number of changes is large.

On a large repo with a commit of 8000 files, this reduces the commit time by 7
seconds (fastdelta goes from 8 seconds to 1).

I tested this change by modifying the function to compare the old and the new
values and running the test suite. The only difference is that the pure
text-diff algorithm sometimes produces smaller (but functionaly identical)
deltatexts than the bisect algorithm.
Martin von Zweigbergk - Nov. 6, 2015, 5:45 a.m.
On Thu, Nov 5, 2015 at 7:00 PM Durham Goode <durham@fb.com> wrote:

> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1446778600 28800
> #      Thu Nov 05 18:56:40 2015 -0800
> # Node ID 3b5d30cbbf1132be2325c5362a156038bdc84e2c
> # Parent  f9984f76fd90e439221425d751e29bae17bec995
> manifest: skip fastdelta if the change is large
>
> In large repos, the existing manifest fastdelta computation (which
> performs a
> bisect on the raw manifest for every file that is changing), is excessively
> slow. This patch makes fastdelta fallback to the normal string delta
> algorithm
> if the number of changes is large.
>
> On a large repo with a commit of 8000 files, this reduces the commit time
> by 7
> seconds (fastdelta goes from 8 seconds to 1).
>

Nice. I've pushed this to the clowncopter, thanks.

It seems like we should be able to speed up the fastdelta() and _msearch()
code by working on the cached manifest (the lazymanifest has binary search
builtin) instead of the cached manifest text. But even if that does make it
faster, your patch is probably still an improvement. Besides, your patch
makes it faster today.


> I tested this change by modifying the function to compare the old and the
> new
> values and running the test suite. The only difference is that the pure
> text-diff algorithm sometimes produces smaller (but functionaly identical)
> deltatexts than the bisect algorithm.
>
> diff --git a/mercurial/manifest.py b/mercurial/manifest.py
> --- a/mercurial/manifest.py
> +++ b/mercurial/manifest.py
> @@ -334,36 +334,44 @@ class manifestdict(object):
>          # zero copy representation of base as a buffer
>          addbuf = util.buffer(base)
>
> -        # start with a readonly loop that finds the offset of
> -        # each line and creates the deltas
> -        for f, todelete in changes:
> -            # bs will either be the index of the item or the insert point
> -            start, end = _msearch(addbuf, f, start)
> -            if not todelete:
> -                h, fl = self._lm[f]
> -                l = "%s\0%s%s\n" % (f, revlog.hex(h), fl)
> -            else:
> -                if start == end:
> -                    # item we want to delete was not found, error out
> -                    raise AssertionError(
> -                            _("failed to remove %s from manifest") % f)
> -                l = ""
> -            if dstart is not None and dstart <= start and dend >= start:
> -                if dend < end:
> +        changes = list(changes)
> +        if len(changes) < 1000:
> +            # start with a readonly loop that finds the offset of
> +            # each line and creates the deltas
> +            for f, todelete in changes:
> +                # bs will either be the index of the item or the insert
> point
> +                start, end = _msearch(addbuf, f, start)
> +                if not todelete:
> +                    h, fl = self._lm[f]
> +                    l = "%s\0%s%s\n" % (f, revlog.hex(h), fl)
> +                else:
> +                    if start == end:
> +                        # item we want to delete was not found, error out
> +                        raise AssertionError(
> +                                _("failed to remove %s from manifest") %
> f)
> +                    l = ""
> +                if dstart is not None and dstart <= start and dend >=
> start:
> +                    if dend < end:
> +                        dend = end
> +                    if l:
> +                        dline.append(l)
> +                else:
> +                    if dstart is not None:
> +                        delta.append([dstart, dend, "".join(dline)])
> +                    dstart = start
>                      dend = end
> -                if l:
> -                    dline.append(l)
> -            else:
> -                if dstart is not None:
> -                    delta.append([dstart, dend, "".join(dline)])
> -                dstart = start
> -                dend = end
> -                dline = [l]
> +                    dline = [l]
>
> -        if dstart is not None:
> -            delta.append([dstart, dend, "".join(dline)])
> -        # apply the delta to the base, and get a delta for addrevision
> -        deltatext, arraytext = _addlistdelta(base, delta)
> +            if dstart is not None:
> +                delta.append([dstart, dend, "".join(dline)])
> +            # apply the delta to the base, and get a delta for addrevision
> +            deltatext, arraytext = _addlistdelta(base, delta)
> +        else:
> +            # For large changes, it's much cheaper to just build the text
> and
> +            # diff it.
> +            arraytext = array.array('c', self.text())
> +            deltatext = mdiff.textdiff(base, arraytext)
> +
>          return arraytext, deltatext
>
>  def _msearch(m, s, lo=0, hi=None):
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> https://selenic.com/mailman/listinfo/mercurial-devel
>

Patch

diff --git a/mercurial/manifest.py b/mercurial/manifest.py
--- a/mercurial/manifest.py
+++ b/mercurial/manifest.py
@@ -334,36 +334,44 @@  class manifestdict(object):
         # zero copy representation of base as a buffer
         addbuf = util.buffer(base)
 
-        # start with a readonly loop that finds the offset of
-        # each line and creates the deltas
-        for f, todelete in changes:
-            # bs will either be the index of the item or the insert point
-            start, end = _msearch(addbuf, f, start)
-            if not todelete:
-                h, fl = self._lm[f]
-                l = "%s\0%s%s\n" % (f, revlog.hex(h), fl)
-            else:
-                if start == end:
-                    # item we want to delete was not found, error out
-                    raise AssertionError(
-                            _("failed to remove %s from manifest") % f)
-                l = ""
-            if dstart is not None and dstart <= start and dend >= start:
-                if dend < end:
+        changes = list(changes)
+        if len(changes) < 1000:
+            # start with a readonly loop that finds the offset of
+            # each line and creates the deltas
+            for f, todelete in changes:
+                # bs will either be the index of the item or the insert point
+                start, end = _msearch(addbuf, f, start)
+                if not todelete:
+                    h, fl = self._lm[f]
+                    l = "%s\0%s%s\n" % (f, revlog.hex(h), fl)
+                else:
+                    if start == end:
+                        # item we want to delete was not found, error out
+                        raise AssertionError(
+                                _("failed to remove %s from manifest") % f)
+                    l = ""
+                if dstart is not None and dstart <= start and dend >= start:
+                    if dend < end:
+                        dend = end
+                    if l:
+                        dline.append(l)
+                else:
+                    if dstart is not None:
+                        delta.append([dstart, dend, "".join(dline)])
+                    dstart = start
                     dend = end
-                if l:
-                    dline.append(l)
-            else:
-                if dstart is not None:
-                    delta.append([dstart, dend, "".join(dline)])
-                dstart = start
-                dend = end
-                dline = [l]
+                    dline = [l]
 
-        if dstart is not None:
-            delta.append([dstart, dend, "".join(dline)])
-        # apply the delta to the base, and get a delta for addrevision
-        deltatext, arraytext = _addlistdelta(base, delta)
+            if dstart is not None:
+                delta.append([dstart, dend, "".join(dline)])
+            # apply the delta to the base, and get a delta for addrevision
+            deltatext, arraytext = _addlistdelta(base, delta)
+        else:
+            # For large changes, it's much cheaper to just build the text and
+            # diff it.
+            arraytext = array.array('c', self.text())
+            deltatext = mdiff.textdiff(base, arraytext)
+
         return arraytext, deltatext
 
 def _msearch(m, s, lo=0, hi=None):