Patchwork [3,of,4] delta: exclude base candidate much smaller than the target

login
register
mail settings
Submitter Boris Feld
Date Dec. 14, 2018, 9:24 p.m.
Message ID <571449ab5d5855256402.1544822677@localhost.localdomain>
Download mbox | patch
Permalink /patch/37163/
State Accepted
Headers show

Comments

Boris Feld - Dec. 14, 2018, 9:24 p.m.
# HG changeset patch
# User Boris Feld <boris.feld@octobus.net>
# Date 1544089163 -3600
#      Thu Dec 06 10:39:23 2018 +0100
# Node ID 571449ab5d585525640279fef7979f9cc4f31637
# Parent  2ae099bc17b55791398ebc00d38fb4f988a1e2e3
# EXP-Topic sparse-revlog-corner-cases
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 571449ab5d58
delta: exclude base candidate much smaller than the target

If a revision's full text is that much bigger than a base candidate full text,
we no longer consider that candidate.

This solves a pathological case we encountered on a very specify repository.
It contains a long series of changesets with a very small manifest (one file)
co-existing with others changesets using a very large manifest.

Without this filtering, we ended up considering a large number of tiny full
snapshots as a potential base. It resulted in very large delta (the size of
the full text) and mercurial spending 99% of its time compressing these
deltas.

The timing of a commit moved from about 400s to about 10s (still slow, but not
ridiculously slow).
Yuya Nishihara - Dec. 15, 2018, 2:11 a.m.
On Fri, 14 Dec 2018 21:24:37 +0000, Boris Feld wrote:
> # HG changeset patch
> # User Boris Feld <boris.feld@octobus.net>
> # Date 1544089163 -3600
> #      Thu Dec 06 10:39:23 2018 +0100
> # Node ID 571449ab5d585525640279fef7979f9cc4f31637
> # Parent  2ae099bc17b55791398ebc00d38fb4f988a1e2e3
> # EXP-Topic sparse-revlog-corner-cases
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 571449ab5d58
> delta: exclude base candidate much smaller than the target

Queued 1, 2, and 4, thanks.

> +# If a revision's full text is that much bigger than a base candidate full
> +# text's, it is very unlikely that it will produce a valid delta. We not longer
> +# consider these candidates.
> +_GIVE_UP_RATIO = 50

Can you align this with LIMIT_DELTA2TEXT? Even though they are slightly
different, compressed vs uncompressed, they look quite similar.
"give up ratio" doesn't provide any hint what value is it for.

> @@ -614,6 +619,7 @@ def _candidategroups(revlog, textlen, p1
>  
>      deltalength = revlog.length
>      deltaparent = revlog.deltaparent
> +    sparse = revlog._sparserevlog
>      good = None
>  
>      deltas_limit = textlen * LIMIT_DELTA2TEXT
> @@ -644,6 +650,8 @@ def _candidategroups(revlog, textlen, p1
>              # filter out delta base that will never produce good delta
>              if deltas_limit < revlog.length(rev):
>                  continue
> +            if sparse and revlog.rawsize(rev) < (textlen // _GIVE_UP_RATIO):
> +                continue

Patch

diff --git a/mercurial/revlogutils/deltas.py b/mercurial/revlogutils/deltas.py
--- a/mercurial/revlogutils/deltas.py
+++ b/mercurial/revlogutils/deltas.py
@@ -601,6 +601,11 @@  def isgooddeltainfo(revlog, deltainfo, r
 
     return True
 
+# If a revision's full text is that much bigger than a base candidate full
+# text's, it is very unlikely that it will produce a valid delta. We not longer
+# consider these candidates.
+_GIVE_UP_RATIO = 50
+
 def _candidategroups(revlog, textlen, p1, p2, cachedelta):
     """Provides group of revision to be tested as delta base
 
@@ -614,6 +619,7 @@  def _candidategroups(revlog, textlen, p1
 
     deltalength = revlog.length
     deltaparent = revlog.deltaparent
+    sparse = revlog._sparserevlog
     good = None
 
     deltas_limit = textlen * LIMIT_DELTA2TEXT
@@ -644,6 +650,8 @@  def _candidategroups(revlog, textlen, p1
             # filter out delta base that will never produce good delta
             if deltas_limit < revlog.length(rev):
                 continue
+            if sparse and revlog.rawsize(rev) < (textlen // _GIVE_UP_RATIO):
+                continue
             # no delta for rawtext-changing revs (see "candelta" for why)
             if revlog.flags(rev) & REVIDX_RAWTEXT_CHANGING_FLAGS:
                 continue