Patchwork [2,of,5,rfc] bdiff: adjust criteria for getting optimal longest match in the middle

login
register
mail settings
Submitter Mads Kiilerich
Date Nov. 3, 2016, 9:34 p.m.
Message ID <c593308da04e9144da01.1478208852@madski>
Download mbox | patch
Permalink /patch/17323/
State Deferred
Headers show

Comments

Mads Kiilerich - Nov. 3, 2016, 9:34 p.m.
# HG changeset patch
# User Mads Kiilerich <madski@unity3d.com>
# Date 1478208837 -3600
#      Thu Nov 03 22:33:57 2016 +0100
# Node ID c593308da04e9144da01a08401d886a64985c74b
# Parent  f6408efe0d0f4179fe6cc2b967164c1b4567f3d6
bdiff: adjust criteria for getting optimal longest match in the middle

We prefer matches closer to the middle to balance recursion, as introduced in
f1ca249696ed.

For ranges with uneven length, matches starting exactly in the middle should
have preference. That will be optimal for matches of length 1. We will thus
accept equality in the half check.

For ranges with even length, half was ceil'ed when calculated but we got the
preference for low matches from the 'less than half' check. To get the same
result as before when we also accept equality, floor it. Without that,
test-annotate.t would show some different (still correct but less optimal)
results.

This will change the heuristics. Tests shows a slightly different output - and
sometimes slightly smaller bundles.

Patch

diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c
--- a/mercurial/bdiff.c
+++ b/mercurial/bdiff.c
@@ -146,7 +146,7 @@  static int longest_match(struct bdiff_li
 	if (a2 - a1 > 30000)
 		a1 = a2 - 30000;
 
-	half = (a1 + a2) / 2;
+	half = (a1 + a2 - 1) / 2;
 
 	for (i = a1; i < a2; i++) {
 		/* skip all lines in b after the current block */
@@ -172,7 +172,7 @@  static int longest_match(struct bdiff_li
 
 			/* best match so far? we prefer matches closer
 			   to the middle to balance recursion */
-			if (k > mk || (k == mk && (i <= mi || i < half))) {
+			if (k > mk || (k == mk && (i <= mi || i <= half))) {
 				mi = i;
 				mj = j;
 				mk = k;
diff --git a/tests/test-bhalf.t b/tests/test-bhalf.t
--- a/tests/test-bhalf.t
+++ b/tests/test-bhalf.t
@@ -33,20 +33,21 @@  Diff of boring files:
   -once upon a time 9
   -The quick brown fox jumps over the lazy dog
   -once upon a time 10
+  -The quick brown fox jumps over the lazy dog
+  -once upon a time 11
   +twice upon a time 2
    The quick brown fox jumps over the lazy dog
-  -once upon a time 11
+  -once upon a time 12
   -The quick brown fox jumps over the lazy dog
-  -once upon a time 12
+  -once upon a time 13
   +twice upon a time 3
    The quick brown fox jumps over the lazy dog
-  -once upon a time 13
+  -once upon a time 14
   +twice upon a time 4
    The quick brown fox jumps over the lazy dog
-  -once upon a time 14
+  -once upon a time 15
   +twice upon a time 5
-   The quick brown fox jumps over the lazy dog
-  -once upon a time 15
+  +The quick brown fox jumps over the lazy dog
   +twice upon a time 6
   +The quick brown fox jumps over the lazy dog
   +twice upon a time 7
@@ -76,7 +77,7 @@  That's an odd diff for a trivial change!
   $ cd ..
 
   $ f --size bundle.hg
-  bundle.hg: size=878
+  bundle.hg: size=870
 
 Explore some bdiff implementation edge cases:
 
@@ -121,20 +122,21 @@  Explore some bdiff implementation edge c
   --- a/y
   +++ b/y
   @@ -1,3 +1,1 @@
+  -a
    a
   -a
-  -a
   diff --git a/z b/z
   --- a/z
   +++ b/z
   @@ -1,5 +1,1 @@
   -a
+  -a
    a
   -a
   -a
-  -a
 
-x and y shows the preference for adding / removing at the end of sequences ...
-z just seems weird.
+x shows the preference for adding at the end of sequences ...
+while y and z shows the preference for balanced recursion
+which is more efficient in stack and performance.
 
   $ cd ..