Patchwork [1,of,3,V3] changegroup: document deltaparent's choice of previous revision

mail settings
Submitter Gregory Szorc
Date Oct. 13, 2016, 7:44 p.m.
Message ID <4c98fccd5b70d2f89fe4.1476387888@gps-mbp.local>
Download mbox | patch
Permalink /patch/17065/
State Accepted
Headers show


Gregory Szorc - Oct. 13, 2016, 7:44 p.m.
# HG changeset patch
# User Gregory Szorc <>
# Date 1476355787 -7200
#      Thu Oct 13 12:49:47 2016 +0200
# Node ID 4c98fccd5b70d2f89fe48bdb81288448daed758c
# Parent  c0410814002f467c24ef07ce73850ba15b306f8e
changegroup: document deltaparent's choice of previous revision

As part of debugging low-level changegroup generation, I came across
what I initially thought was a weird behavior: changegroup v2 is
choosing the previous revision in the changegroup as a delta base
instead of p1. I was tempted to rewrite this to use p1, as p1
will delta better than prev in the common case. However, I realized
that taking p1 as the base would potentially require resolving a
revision fulltext and thus require more CPU for e.g. server-side
processing of getbundle requests.

This patch tweaks the code comment to note the choice of behavior.
It also notes there is room for a flag or config option to tweak
this behavior later: using p1 as the delta base would likely make
changegroups smaller at the expense of more CPU, which could be
beneficial for things like clone bundles.


diff --git a/mercurial/ b/mercurial/
--- a/mercurial/
+++ b/mercurial/
@@ -817,10 +817,17 @@  class cg2packer(cg1packer):
             self._reorder = False
     def deltaparent(self, revlog, rev, p1, p2, prev):
         dp = revlog.deltaparent(rev)
-        # avoid storing full revisions; pick prev in those cases
-        # also pick prev when we can't be sure remote has dp
+        # Avoid sending full revisions when delta parent is null. Pick
+        # prev in that case. It's tempting to pick p1 in this case, as p1
+        # will be smaller in the common case. However, computing a delta
+        # against p1 may require resolving the raw text of p1, which could
+        # be expensive. The revlog caches should have prev cached, meaning
+        # less CPU for changegroup generation. There is likely room to add
+        # a flag and/or config option to control this behavior.
+        #
+        # Pick prev when we can't be sure remote has the base revision.
         if dp == nullrev or (dp != p1 and dp != p2 and dp != prev):
             return prev
         return dp