Patchwork [3,of,7,v2] bdiff: adjust criteria for getting optimal longest match in the A side middle

login
register
mail settings
Submitter Mads Kiilerich
Date Nov. 15, 2016, 8:57 p.m.
Message ID <fbce612c9be26860338f.1479243424@madski>
Download mbox | patch
Permalink /patch/17590/
State Accepted
Headers show

Comments

Mads Kiilerich - Nov. 15, 2016, 8:57 p.m.
# HG changeset patch
# User Mads Kiilerich <madski@unity3d.com>
# Date 1478626653 -3600
#      Tue Nov 08 18:37:33 2016 +0100
# Node ID fbce612c9be26860338f6ba4ab5df819b6d79521
# Parent  0f97aaa1b3f5c25e02296c73e6e93d77a197910c
bdiff: adjust criteria for getting optimal longest match in the A side middle

We prefer matches closer to the middle to balance recursion, as introduced in
f1ca249696ed.

For ranges with uneven length, matches starting exactly in the middle should
have preference. That will be optimal for matches of length 1. We will thus
accept equality in the half check.

For ranges with even length, half was ceil'ed when calculated but we got the
preference for low matches from the 'less than half' check. To get the same
result as before when we also accept equality, floor it. Without that,
test-annotate.t would show some different (still correct but less optimal)
results.

This will change the heuristics. Tests shows a slightly different output - and
sometimes slightly smaller bundles.

The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from
22804885 to 22803824 bytes - an 0.005% reduction.

Patch

diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c
--- a/mercurial/bdiff.c
+++ b/mercurial/bdiff.c
@@ -151,7 +151,7 @@  static int longest_match(struct bdiff_li
 	if (a2 - a1 > 30000)
 		a1 = a2 - 30000;
 
-	half = (a1 + a2) / 2;
+	half = (a1 + a2 - 1) / 2;
 
 	for (i = a1; i < a2; i++) {
 		/* skip all lines in b after the current block */
@@ -177,7 +177,7 @@  static int longest_match(struct bdiff_li
 
 			/* best match so far? we prefer matches closer
 			   to the middle to balance recursion */
-			if (k > mk || (k == mk && (i <= mi || i < half))) {
+			if (k > mk || (k == mk && (i <= mi || i <= half))) {
 				mi = i;
 				mj = j;
 				mk = k;
diff --git a/tests/test-bdiff.py b/tests/test-bdiff.py
--- a/tests/test-bdiff.py
+++ b/tests/test-bdiff.py
@@ -88,7 +88,7 @@  print("Diff 1 to 3 lines - preference fo
 showdiff('a\n', 'a\n' * 3)
 print("Diff 1 to 5 lines - preference for adding / removing at the end of sequences:")
 showdiff('a\n', 'a\n' * 5)
-print("Diff 3 to 1 lines - preference for adding / removing at the end of sequences:")
+print("Diff 3 to 1 lines - preference for balanced recursion:")
 showdiff('a\n' * 3, 'a\n')
-print("Diff 5 to 1 lines - this diff seems weird:")
+print("Diff 5 to 1 lines - preference for balanced recursion:")
 showdiff('a\n' * 5, 'a\n')
diff --git a/tests/test-bdiff.py.out b/tests/test-bdiff.py.out
--- a/tests/test-bdiff.py.out
+++ b/tests/test-bdiff.py.out
@@ -67,16 +67,17 @@  showdiff(
   'a\na\na\na\na\n'):
  'a\n'
  2 2 '' -> 'a\na\na\na\n'
-Diff 3 to 1 lines - preference for adding / removing at the end of sequences:
+Diff 3 to 1 lines - preference for balanced recursion:
 showdiff(
   'a\na\na\n',
   'a\n'):
+ 0 2 'a\n' -> ''
  'a\n'
- 2 6 'a\na\n' -> ''
-Diff 5 to 1 lines - this diff seems weird:
+ 4 6 'a\n' -> ''
+Diff 5 to 1 lines - preference for balanced recursion:
 showdiff(
   'a\na\na\na\na\n',
   'a\n'):
- 0 2 'a\n' -> ''
+ 0 4 'a\na\n' -> ''
  'a\n'
- 4 10 'a\na\na\n' -> ''
+ 6 10 'a\na\n' -> ''