Patchwork [2,of,2,remotefilelog-ext,getfile-batching] fileserverclient: add config knob to control batch size

login
register
mail settings
Submitter Augie Fackler
Date Aug. 18, 2015, 8:07 p.m.
Message ID <802538ce2a9be5448284.1439928446@arthedain.pit.corp.google.com>
Download mbox | patch
Permalink /patch/10233/
State Accepted
Headers show

Comments

Augie Fackler - Aug. 18, 2015, 8:07 p.m.
# HG changeset patch
# User Augie Fackler <augie@google.com>
# Date 1439925241 14400
#      Tue Aug 18 15:14:01 2015 -0400
# Node ID 802538ce2a9be544828417ff0a86aba847a63307
# Parent  1183fcaf20272ca5dbcc0c308e9e11b1c7c97093
fileserverclient: add config knob to control batch size

Previously we'd just send one enormous batch for everything to the
server. This led to prolonged periods of no progress output for the
user. Now we send batches in smaller chunks (default is 100) which
gives the user some idea that things are working.

Includes a trivial test, which doesn't really verify that the batching
logic is used as described, but at least prevents the boneheaded error
I had in an earlier (unmailed) version of this patch which forgot to
use configint() when loading the config setting.
Gregory Szorc - Aug. 20, 2015, 3:48 p.m.
> On Aug 18, 2015, at 13:07, Augie Fackler <raf@durin42.com> wrote:
> 
> # HG changeset patch
> # User Augie Fackler <augie@google.com>
> # Date 1439925241 14400
> #      Tue Aug 18 15:14:01 2015 -0400
> # Node ID 802538ce2a9be544828417ff0a86aba847a63307
> # Parent  1183fcaf20272ca5dbcc0c308e9e11b1c7c97093
> fileserverclient: add config knob to control batch size
> 
> Previously we'd just send one enormous batch for everything to the
> server. This led to prolonged periods of no progress output for the
> user. Now we send batches in smaller chunks (default is 100) which
> gives the user some idea that things are working.
> 
> Includes a trivial test, which doesn't really verify that the batching
> logic is used as described, but at least prevents the boneheaded error
> I had in an earlier (unmailed) version of this patch which forgot to
> use configint() when loading the config setting.

Could you cat the server's access log to verify batching is being performed?

> 
> diff --git a/remotefilelog/fileserverclient.py b/remotefilelog/fileserverclient.py
> --- a/remotefilelog/fileserverclient.py
> +++ b/remotefilelog/fileserverclient.py
> @@ -100,18 +100,21 @@ class cacheconnection(object):
> 
>         return result
> 
> -def _getfilesbatch(remote, receivemissing, progresstick, missed, idmap):
> -    b = remote.batch()
> -    futures = {}
> -    for m in missed:
> -        file_ = idmap[m]
> -        node = m[-40:]
> -        futures[m] = b.getfile(file_, node)
> -    b.submit()
> -    for m in missed:
> -        v = futures[m].value
> -        receivemissing(io.BytesIO('%d\n%s' % (len(v), v)), m)
> -        progresstick()
> +def _getfilesbatch(
> +        remote, receivemissing, progresstick, missed, idmap, batchsize):
> +    while missed:
> +        chunk, missed = missed[:batchsize], missed[batchsize:]
> +        b = remote.batch()
> +        futures = {}
> +        for m in chunk:
> +            file_ = idmap[m]
> +            node = m[-40:]
> +            futures[m] = b.getfile(file_, node)
> +        b.submit()
> +        for m in chunk:
> +            v = futures[m].value
> +            receivemissing(io.BytesIO('%d\n%s' % (len(v), v)), m)
> +            progresstick()
> 
> def _getfiles(
>     remote, receivemissing, fallbackpath, progresstick, missed, idmap):
> @@ -234,8 +237,12 @@ class fileserverclient(object):
>                         _getfiles(remote, self.receivemissing, fallbackpath,
>                                   progresstick, missed, idmap)
>                     elif remote.capable("getfile"):
> +                        batchdefault = 100 if remote.capable('batch') else 10
> +                        batchsize = self.ui.configint(
> +                            'remotefilelog', 'batchsize', batchdefault)
>                         _getfilesbatch(
> -                            remote, self.receivemissing, progresstick, missed, idmap)
> +                            remote, self.receivemissing, progresstick, missed,
> +                            idmap, batchsize)
>                     else:
>                         raise util.Abort("configured remotefilelog server"
>                                          " does not support remotefilelog")
> diff --git a/tests/test-http.t b/tests/test-http.t
> --- a/tests/test-http.t
> +++ b/tests/test-http.t
> @@ -20,6 +20,12 @@ Build a query string for later use:
>   $ hgcloneshallow http://localhost:$HGPORT/ shallow -q
>   1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob)
> 
> +Clear filenode cache so we can test fetching with a modified batch size
> +  $ rm -r $TESTTMP/hgcache
> +Now do a fetch with a large batch size so we're sure it works
> +  $ hgcloneshallow http://localhost:$HGPORT/ shallow-large-batch \
> +  >    --config remotefilelog.batchsize=1000 -q
> +  1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob)
> 
> The 'remotefilelog' capability should *not* be exported over http(s),
> as the getfile method it offers doesn't work with http.
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> https://selenic.com/mailman/listinfo/mercurial-devel
Augie Fackler - Aug. 21, 2015, 5:02 p.m.
On Thu, Aug 20, 2015 at 08:48:46AM -0700, Gregory Szorc wrote:
>
>
> > On Aug 18, 2015, at 13:07, Augie Fackler <raf@durin42.com> wrote:
> >
> > # HG changeset patch
> > # User Augie Fackler <augie@google.com>
> > # Date 1439925241 14400
> > #      Tue Aug 18 15:14:01 2015 -0400
> > # Node ID 802538ce2a9be544828417ff0a86aba847a63307
> > # Parent  1183fcaf20272ca5dbcc0c308e9e11b1c7c97093
> > fileserverclient: add config knob to control batch size
> >
> > Previously we'd just send one enormous batch for everything to the
> > server. This led to prolonged periods of no progress output for the
> > user. Now we send batches in smaller chunks (default is 100) which
> > gives the user some idea that things are working.
> >
> > Includes a trivial test, which doesn't really verify that the batching
> > logic is used as described, but at least prevents the boneheaded error
> > I had in an earlier (unmailed) version of this patch which forgot to
> > use configint() when loading the config setting.
>
> Could you cat the server's access log to verify batching is being performed?

Good idea. I'll do that as a followup, since it's slightly orthogonal
to this patch. Thanks!

>
> >
> > diff --git a/remotefilelog/fileserverclient.py b/remotefilelog/fileserverclient.py
> > --- a/remotefilelog/fileserverclient.py
> > +++ b/remotefilelog/fileserverclient.py
> > @@ -100,18 +100,21 @@ class cacheconnection(object):
> >
> >         return result
> >
> > -def _getfilesbatch(remote, receivemissing, progresstick, missed, idmap):
> > -    b = remote.batch()
> > -    futures = {}
> > -    for m in missed:
> > -        file_ = idmap[m]
> > -        node = m[-40:]
> > -        futures[m] = b.getfile(file_, node)
> > -    b.submit()
> > -    for m in missed:
> > -        v = futures[m].value
> > -        receivemissing(io.BytesIO('%d\n%s' % (len(v), v)), m)
> > -        progresstick()
> > +def _getfilesbatch(
> > +        remote, receivemissing, progresstick, missed, idmap, batchsize):
> > +    while missed:
> > +        chunk, missed = missed[:batchsize], missed[batchsize:]
> > +        b = remote.batch()
> > +        futures = {}
> > +        for m in chunk:
> > +            file_ = idmap[m]
> > +            node = m[-40:]
> > +            futures[m] = b.getfile(file_, node)
> > +        b.submit()
> > +        for m in chunk:
> > +            v = futures[m].value
> > +            receivemissing(io.BytesIO('%d\n%s' % (len(v), v)), m)
> > +            progresstick()
> >
> > def _getfiles(
> >     remote, receivemissing, fallbackpath, progresstick, missed, idmap):
> > @@ -234,8 +237,12 @@ class fileserverclient(object):
> >                         _getfiles(remote, self.receivemissing, fallbackpath,
> >                                   progresstick, missed, idmap)
> >                     elif remote.capable("getfile"):
> > +                        batchdefault = 100 if remote.capable('batch') else 10
> > +                        batchsize = self.ui.configint(
> > +                            'remotefilelog', 'batchsize', batchdefault)
> >                         _getfilesbatch(
> > -                            remote, self.receivemissing, progresstick, missed, idmap)
> > +                            remote, self.receivemissing, progresstick, missed,
> > +                            idmap, batchsize)
> >                     else:
> >                         raise util.Abort("configured remotefilelog server"
> >                                          " does not support remotefilelog")
> > diff --git a/tests/test-http.t b/tests/test-http.t
> > --- a/tests/test-http.t
> > +++ b/tests/test-http.t
> > @@ -20,6 +20,12 @@ Build a query string for later use:
> >   $ hgcloneshallow http://localhost:$HGPORT/ shallow -q
> >   1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob)
> >
> > +Clear filenode cache so we can test fetching with a modified batch size
> > +  $ rm -r $TESTTMP/hgcache
> > +Now do a fetch with a large batch size so we're sure it works
> > +  $ hgcloneshallow http://localhost:$HGPORT/ shallow-large-batch \
> > +  >    --config remotefilelog.batchsize=1000 -q
> > +  1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob)
> >
> > The 'remotefilelog' capability should *not* be exported over http(s),
> > as the getfile method it offers doesn't work with http.
> > _______________________________________________
> > Mercurial-devel mailing list
> > Mercurial-devel@selenic.com
> > https://selenic.com/mailman/listinfo/mercurial-devel
Durham Goode - Aug. 24, 2015, 10:56 p.m.
On 8/18/15 1:07 PM, Augie Fackler wrote:
> # HG changeset patch
> # User Augie Fackler <augie@google.com>
> # Date 1439925241 14400
> #      Tue Aug 18 15:14:01 2015 -0400
> # Node ID 802538ce2a9be544828417ff0a86aba847a63307
> # Parent  1183fcaf20272ca5dbcc0c308e9e11b1c7c97093
> fileserverclient: add config knob to control batch size
>
> Previously we'd just send one enormous batch for everything to the
> server. This led to prolonged periods of no progress output for the
> user. Now we send batches in smaller chunks (default is 100) which
> gives the user some idea that things are working.
>
> Includes a trivial test, which doesn't really verify that the batching
> logic is used as described, but at least prevents the boneheaded error
> I had in an earlier (unmailed) version of this patch which forgot to
> use configint() when loading the config setting.
>
Queued and pushed both. Thanks!

Patch

diff --git a/remotefilelog/fileserverclient.py b/remotefilelog/fileserverclient.py
--- a/remotefilelog/fileserverclient.py
+++ b/remotefilelog/fileserverclient.py
@@ -100,18 +100,21 @@  class cacheconnection(object):
 
         return result
 
-def _getfilesbatch(remote, receivemissing, progresstick, missed, idmap):
-    b = remote.batch()
-    futures = {}
-    for m in missed:
-        file_ = idmap[m]
-        node = m[-40:]
-        futures[m] = b.getfile(file_, node)
-    b.submit()
-    for m in missed:
-        v = futures[m].value
-        receivemissing(io.BytesIO('%d\n%s' % (len(v), v)), m)
-        progresstick()
+def _getfilesbatch(
+        remote, receivemissing, progresstick, missed, idmap, batchsize):
+    while missed:
+        chunk, missed = missed[:batchsize], missed[batchsize:]
+        b = remote.batch()
+        futures = {}
+        for m in chunk:
+            file_ = idmap[m]
+            node = m[-40:]
+            futures[m] = b.getfile(file_, node)
+        b.submit()
+        for m in chunk:
+            v = futures[m].value
+            receivemissing(io.BytesIO('%d\n%s' % (len(v), v)), m)
+            progresstick()
 
 def _getfiles(
     remote, receivemissing, fallbackpath, progresstick, missed, idmap):
@@ -234,8 +237,12 @@  class fileserverclient(object):
                         _getfiles(remote, self.receivemissing, fallbackpath,
                                   progresstick, missed, idmap)
                     elif remote.capable("getfile"):
+                        batchdefault = 100 if remote.capable('batch') else 10
+                        batchsize = self.ui.configint(
+                            'remotefilelog', 'batchsize', batchdefault)
                         _getfilesbatch(
-                            remote, self.receivemissing, progresstick, missed, idmap)
+                            remote, self.receivemissing, progresstick, missed,
+                            idmap, batchsize)
                     else:
                         raise util.Abort("configured remotefilelog server"
                                          " does not support remotefilelog")
diff --git a/tests/test-http.t b/tests/test-http.t
--- a/tests/test-http.t
+++ b/tests/test-http.t
@@ -20,6 +20,12 @@  Build a query string for later use:
   $ hgcloneshallow http://localhost:$HGPORT/ shallow -q
   1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob)
 
+Clear filenode cache so we can test fetching with a modified batch size
+  $ rm -r $TESTTMP/hgcache
+Now do a fetch with a large batch size so we're sure it works
+  $ hgcloneshallow http://localhost:$HGPORT/ shallow-large-batch \
+  >    --config remotefilelog.batchsize=1000 -q
+  1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob)
 
 The 'remotefilelog' capability should *not* be exported over http(s),
 as the getfile method it offers doesn't work with http.