From patchwork Thu Nov 3 21:34:12 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: [2, of, 5, rfc] bdiff: adjust criteria for getting optimal longest match in the middle From: Mads Kiilerich X-Patchwork-Id: 17323 Message-Id: To: mercurial-devel@mercurial-scm.org Date: Thu, 03 Nov 2016 22:34:12 +0100 # HG changeset patch # User Mads Kiilerich # Date 1478208837 -3600 # Thu Nov 03 22:33:57 2016 +0100 # Node ID c593308da04e9144da01a08401d886a64985c74b # Parent f6408efe0d0f4179fe6cc2b967164c1b4567f3d6 bdiff: adjust criteria for getting optimal longest match in the middle We prefer matches closer to the middle to balance recursion, as introduced in f1ca249696ed. For ranges with uneven length, matches starting exactly in the middle should have preference. That will be optimal for matches of length 1. We will thus accept equality in the half check. For ranges with even length, half was ceil'ed when calculated but we got the preference for low matches from the 'less than half' check. To get the same result as before when we also accept equality, floor it. Without that, test-annotate.t would show some different (still correct but less optimal) results. This will change the heuristics. Tests shows a slightly different output - and sometimes slightly smaller bundles. diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c --- a/mercurial/bdiff.c +++ b/mercurial/bdiff.c @@ -146,7 +146,7 @@ static int longest_match(struct bdiff_li if (a2 - a1 > 30000) a1 = a2 - 30000; - half = (a1 + a2) / 2; + half = (a1 + a2 - 1) / 2; for (i = a1; i < a2; i++) { /* skip all lines in b after the current block */ @@ -172,7 +172,7 @@ static int longest_match(struct bdiff_li /* best match so far? we prefer matches closer to the middle to balance recursion */ - if (k > mk || (k == mk && (i <= mi || i < half))) { + if (k > mk || (k == mk && (i <= mi || i <= half))) { mi = i; mj = j; mk = k; diff --git a/tests/test-bhalf.t b/tests/test-bhalf.t --- a/tests/test-bhalf.t +++ b/tests/test-bhalf.t @@ -33,20 +33,21 @@ Diff of boring files: -once upon a time 9 -The quick brown fox jumps over the lazy dog -once upon a time 10 + -The quick brown fox jumps over the lazy dog + -once upon a time 11 +twice upon a time 2 The quick brown fox jumps over the lazy dog - -once upon a time 11 + -once upon a time 12 -The quick brown fox jumps over the lazy dog - -once upon a time 12 + -once upon a time 13 +twice upon a time 3 The quick brown fox jumps over the lazy dog - -once upon a time 13 + -once upon a time 14 +twice upon a time 4 The quick brown fox jumps over the lazy dog - -once upon a time 14 + -once upon a time 15 +twice upon a time 5 - The quick brown fox jumps over the lazy dog - -once upon a time 15 + +The quick brown fox jumps over the lazy dog +twice upon a time 6 +The quick brown fox jumps over the lazy dog +twice upon a time 7 @@ -76,7 +77,7 @@ That's an odd diff for a trivial change! \$ cd .. \$ f --size bundle.hg - bundle.hg: size=878 + bundle.hg: size=870 Explore some bdiff implementation edge cases: @@ -121,20 +122,21 @@ Explore some bdiff implementation edge c --- a/y +++ b/y @@ -1,3 +1,1 @@ + -a a -a - -a diff --git a/z b/z --- a/z +++ b/z @@ -1,5 +1,1 @@ -a + -a a -a -a - -a -x and y shows the preference for adding / removing at the end of sequences ... -z just seems weird. +x shows the preference for adding at the end of sequences ... +while y and z shows the preference for balanced recursion +which is more efficient in stack and performance. \$ cd ..