# Patchwork [2,of,5,rfc] bdiff: adjust criteria for getting optimal longest match in the middle

Submitter Mads Kiilerich Nov. 3, 2016, 9:34 p.m. mbox | patch /patch/17323/ Deferred show

Mads Kiilerich - Nov. 3, 2016, 9:34 p.m.
```# HG changeset patch
# Date 1478208837 -3600
#      Thu Nov 03 22:33:57 2016 +0100
# Node ID c593308da04e9144da01a08401d886a64985c74b
# Parent  f6408efe0d0f4179fe6cc2b967164c1b4567f3d6
bdiff: adjust criteria for getting optimal longest match in the middle

We prefer matches closer to the middle to balance recursion, as introduced in
f1ca249696ed.

For ranges with uneven length, matches starting exactly in the middle should
have preference. That will be optimal for matches of length 1. We will thus
accept equality in the half check.

For ranges with even length, half was ceil'ed when calculated but we got the
preference for low matches from the 'less than half' check. To get the same
result as before when we also accept equality, floor it. Without that,
test-annotate.t would show some different (still correct but less optimal)
results.

This will change the heuristics. Tests shows a slightly different output - and
sometimes slightly smaller bundles.
```

## Patch

```diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c
--- a/mercurial/bdiff.c
+++ b/mercurial/bdiff.c
@@ -146,7 +146,7 @@  static int longest_match(struct bdiff_li
if (a2 - a1 > 30000)
a1 = a2 - 30000;

-	half = (a1 + a2) / 2;
+	half = (a1 + a2 - 1) / 2;

for (i = a1; i < a2; i++) {
/* skip all lines in b after the current block */
@@ -172,7 +172,7 @@  static int longest_match(struct bdiff_li

/* best match so far? we prefer matches closer
to the middle to balance recursion */
-			if (k > mk || (k == mk && (i <= mi || i < half))) {
+			if (k > mk || (k == mk && (i <= mi || i <= half))) {
mi = i;
mj = j;
mk = k;
diff --git a/tests/test-bhalf.t b/tests/test-bhalf.t
--- a/tests/test-bhalf.t
+++ b/tests/test-bhalf.t
@@ -33,20 +33,21 @@  Diff of boring files:
-once upon a time 9
-The quick brown fox jumps over the lazy dog
-once upon a time 10
+  -The quick brown fox jumps over the lazy dog
+  -once upon a time 11
+twice upon a time 2
The quick brown fox jumps over the lazy dog
-  -once upon a time 11
+  -once upon a time 12
-The quick brown fox jumps over the lazy dog
-  -once upon a time 12
+  -once upon a time 13
+twice upon a time 3
The quick brown fox jumps over the lazy dog
-  -once upon a time 13
+  -once upon a time 14
+twice upon a time 4
The quick brown fox jumps over the lazy dog
-  -once upon a time 14
+  -once upon a time 15
+twice upon a time 5
-   The quick brown fox jumps over the lazy dog
-  -once upon a time 15
+  +The quick brown fox jumps over the lazy dog
+twice upon a time 6
+The quick brown fox jumps over the lazy dog
+twice upon a time 7
@@ -76,7 +77,7 @@  That's an odd diff for a trivial change!
\$ cd ..

\$ f --size bundle.hg
-  bundle.hg: size=878
+  bundle.hg: size=870

Explore some bdiff implementation edge cases:

@@ -121,20 +122,21 @@  Explore some bdiff implementation edge c
--- a/y
+++ b/y
@@ -1,3 +1,1 @@
+  -a
a
-a
-  -a
diff --git a/z b/z
--- a/z
+++ b/z
@@ -1,5 +1,1 @@
-a
+  -a
a
-a
-a
-  -a

-x and y shows the preference for adding / removing at the end of sequences ...
-z just seems weird.
+x shows the preference for adding at the end of sequences ...
+while y and z shows the preference for balanced recursion
+which is more efficient in stack and performance.

\$ cd ..

```