Patchwork [4,of,4,V2] clone: add a server-side option to disable full getbundles (pull-based clones)

login
register
mail settings
Submitter Siddharth Agarwal
Date May 11, 2017, 5:51 p.m.
Message ID <fa570690257581e3197f.1494525083@devvm028.frc2.facebook.com>
Download mbox | patch
Permalink /patch/20575/
State Accepted
Headers show

Comments

Siddharth Agarwal - May 11, 2017, 5:51 p.m.
# HG changeset patch
# User Siddharth Agarwal <sid0@fb.com>
# Date 1494525005 25200
#      Thu May 11 10:50:05 2017 -0700
# Node ID fa570690257581e3197f8730b6522e80d58a3f45
# Parent  052bd5cfe3769b10c64a4a39d9734a2740d44e16
clone: add a server-side option to disable full getbundles (pull-based clones)

For large enough repositories, pull-based clones take too long, and an attempt
to use them indicates some sort of configuration or other issue or maybe an
outdated Mercurial. Add a config option to disable them.
Augie Fackler - May 13, 2017, 2:22 a.m.
On Thu, May 11, 2017 at 10:51:23AM -0700, Siddharth Agarwal wrote:
> # HG changeset patch
> # User Siddharth Agarwal <sid0@fb.com>
> # Date 1494525005 25200
> #      Thu May 11 10:50:05 2017 -0700
> # Node ID fa570690257581e3197f8730b6522e80d58a3f45
> # Parent  052bd5cfe3769b10c64a4a39d9734a2740d44e16
> clone: add a server-side option to disable full getbundles (pull-based clones)

queued, thanks

very nice - I can think of some repositories that should enable
clonebundles and this new rejection mechanism to save tons of server
load. :)
Gregory Szorc - May 13, 2017, 3:08 a.m.
On Fri, May 12, 2017 at 7:22 PM, Augie Fackler <raf@durin42.com> wrote:

> On Thu, May 11, 2017 at 10:51:23AM -0700, Siddharth Agarwal wrote:
> > # HG changeset patch
> > # User Siddharth Agarwal <sid0@fb.com>
> > # Date 1494525005 25200
> > #      Thu May 11 10:50:05 2017 -0700
> > # Node ID fa570690257581e3197f8730b6522e80d58a3f45
> > # Parent  052bd5cfe3769b10c64a4a39d9734a2740d44e16
> > clone: add a server-side option to disable full getbundles (pull-based
> clones)
>
> queued, thanks
>
> very nice - I can think of some repositories that should enable
> clonebundles and this new rejection mechanism to save tons of server
> load. :)
>

FWIW, it is common for >95% of hg.mozilla.org's served bytes to be handled
from clone bundles via S3/CDN. Our daily record saw 42,760,071,088,706
of 43,134,314,834,541
(99.13%) of bytes served from S3. Most of this load is Firefox's CI cloning
the Firefox repositories. We make heavy use of volatile "spot" instances
rather than permanent infrastructure, so there's a lot of cloning go on,
even with aggressive reuse of clones on clients.

If we didn't have clone bundles, we'd need to roll our own caching layer
and/or pay a heavy cost for extra server infrastructure. Ironically, clone
bundles has offloaded so much CPU from servers that we can afford to take
computational hits on the server, such as allowing bundle1 clients to
clone/pull from generaldelta repos and leaving full bundle clones enabled.
But I'm still glad we have config knobs to lock out legacy clients at the
first sign of scaling trouble.
Augie Fackler - May 13, 2017, 3:09 a.m.
> On May 12, 2017, at 23:08, Gregory Szorc <gregory.szorc@gmail.com> wrote:
> 
> On Fri, May 12, 2017 at 7:22 PM, Augie Fackler <raf@durin42.com> wrote:
> On Thu, May 11, 2017 at 10:51:23AM -0700, Siddharth Agarwal wrote:
> > # HG changeset patch
> > # User Siddharth Agarwal <sid0@fb.com>
> > # Date 1494525005 25200
> > #      Thu May 11 10:50:05 2017 -0700
> > # Node ID fa570690257581e3197f8730b6522e80d58a3f45
> > # Parent  052bd5cfe3769b10c64a4a39d9734a2740d44e16
> > clone: add a server-side option to disable full getbundles (pull-based clones)
> 
> queued, thanks
> 
> very nice - I can think of some repositories that should enable
> clonebundles and this new rejection mechanism to save tons of server
> load. :)
> 
> FWIW, it is common for >95% of hg.mozilla.org's served bytes to be handled from clone bundles via S3/CDN. Our daily record saw 42,760,071,088,706 of 43,134,314,834,541 (99.13%) of bytes served from S3. Most of this load is Firefox's CI cloning the Firefox repositories. We make heavy use of volatile "spot" instances rather than permanent infrastructure, so there's a lot of cloning go on, even with aggressive reuse of clones on clients.
> 
> If we didn't have clone bundles, we'd need to roll our own caching layer and/or pay a heavy cost for extra server infrastructure. Ironically, clone bundles has offloaded so much CPU from servers that we can afford to take computational hits on the server, such as allowing bundle1 clients to clone/pull from generaldelta repos and leaving full bundle clones enabled. But I'm still glad we have config knobs to lock out legacy clients at the first sign of scaling trouble.

I was actually thinking of the Adium "full history" repo which is a few gigs and has had pull/clone disabled since it came into existence. But yes, your repos too. :)

Patch

diff --git a/mercurial/help/config.txt b/mercurial/help/config.txt
--- a/mercurial/help/config.txt
+++ b/mercurial/help/config.txt
@@ -1660,6 +1660,12 @@  Controls generic server settings.
     When set, clients will try to use the uncompressed streaming
     protocol. (default: False)
 
+``disablefullbundle``
+    When set, servers will refuse attempts to do pull-based clones.
+    If this option is set, ``preferuncompressed`` and/or clone bundles
+    are highly recommended. Partial clones will still be allowed.
+    (default: False)
+
 ``validate``
     Whether to validate the completeness of pushed changesets by
     checking that all new file revisions specified in manifests are
diff --git a/mercurial/wireproto.py b/mercurial/wireproto.py
--- a/mercurial/wireproto.py
+++ b/mercurial/wireproto.py
@@ -16,6 +16,7 @@  from .i18n import _
 from .node import (
     bin,
     hex,
+    nullid,
 )
 
 from . import (
@@ -841,6 +842,17 @@  def getbundle(repo, proto, others):
                               hint=bundle2requiredhint)
 
     try:
+        if repo.ui.configbool('server', 'disablefullbundle', False):
+            # Check to see if this is a full clone.
+            clheads = set(repo.changelog.heads())
+            heads = set(opts.get('heads', set()))
+            common = set(opts.get('common', set()))
+            common.discard(nullid)
+            if not common and clheads == heads:
+                raise error.Abort(
+                    _('server has pull-based clones disabled'),
+                    hint=_('remove --pull if specified or upgrade Mercurial'))
+
         chunks = exchange.getbundlechunks(repo, 'serve', **opts)
     except error.Abort as exc:
         # cleanly forward Abort error to the client
diff --git a/tests/test-http-bundle1.t b/tests/test-http-bundle1.t
--- a/tests/test-http-bundle1.t
+++ b/tests/test-http-bundle1.t
@@ -365,3 +365,41 @@  Check error reporting while pulling/clon
   this is an exercise
   [255]
   $ cat error.log
+
+disable pull-based clones
+
+  $ hg -R test serve -p $HGPORT1 -d --pid-file=hg4.pid -E error.log --config server.disablefullbundle=True
+  $ cat hg4.pid >> $DAEMON_PIDS
+  $ hg clone http://localhost:$HGPORT1/ disable-pull-clone
+  requesting all changes
+  abort: remote error:
+  server has pull-based clones disabled
+  [255]
+
+... but keep stream clones working
+
+  $ hg clone --uncompressed --noupdate http://localhost:$HGPORT1/ test-stream-clone
+  streaming all changes
+  * files to transfer, * of data (glob)
+  transferred * in * seconds (* KB/sec) (glob)
+  searching for changes
+  no changes found
+
+... and also keep partial clones and pulls working
+  $ hg clone http://localhost:$HGPORT1 --rev 0 test-partial-clone
+  adding changesets
+  adding manifests
+  adding file changes
+  added 1 changesets with 4 changes to 4 files
+  updating to branch default
+  4 files updated, 0 files merged, 0 files removed, 0 files unresolved
+  $ hg pull -R test-partial-clone
+  pulling from http://localhost:$HGPORT1/
+  searching for changes
+  adding changesets
+  adding manifests
+  adding file changes
+  added 2 changesets with 3 changes to 3 files
+  (run 'hg update' to get a working copy)
+
+  $ cat error.log
diff --git a/tests/test-http.t b/tests/test-http.t
--- a/tests/test-http.t
+++ b/tests/test-http.t
@@ -354,6 +354,44 @@  check abort error reporting while pullin
   [255]
   $ cat error.log
 
+disable pull-based clones
+
+  $ hg -R test serve -p $HGPORT1 -d --pid-file=hg4.pid -E error.log --config server.disablefullbundle=True
+  $ cat hg4.pid >> $DAEMON_PIDS
+  $ hg clone http://localhost:$HGPORT1/ disable-pull-clone
+  requesting all changes
+  remote: abort: server has pull-based clones disabled
+  abort: pull failed on remote
+  (remove --pull if specified or upgrade Mercurial)
+  [255]
+
+... but keep stream clones working
+
+  $ hg clone --uncompressed --noupdate http://localhost:$HGPORT1/ test-stream-clone
+  streaming all changes
+  * files to transfer, * of data (glob)
+  transferred * in * seconds (*/sec) (glob)
+  searching for changes
+  no changes found
+  $ cat error.log
+
+... and also keep partial clones and pulls working
+  $ hg clone http://localhost:$HGPORT1 --rev 0 test-partial-clone
+  adding changesets
+  adding manifests
+  adding file changes
+  added 1 changesets with 4 changes to 4 files
+  updating to branch default
+  4 files updated, 0 files merged, 0 files removed, 0 files unresolved
+  $ hg pull -R test-partial-clone
+  pulling from http://localhost:$HGPORT1/
+  searching for changes
+  adding changesets
+  adding manifests
+  adding file changes
+  added 2 changesets with 3 changes to 3 files
+  (run 'hg update' to get a working copy)
+
 corrupt cookies file should yield a warning
 
   $ cat > $TESTTMP/cookies.txt << EOF