Submitter | Augie Fackler |
---|---|
Date | July 16, 2015, 3:36 p.m. |
Message ID | <ea8c6483a58df0eb0c29.1437061008@imladris.local> |
Download | mbox | patch |
Permalink | /patch/10005/ |
State | Accepted |
Headers | show |
Comments
On 7/16/15, 8:36 AM, "raf@durin42.com" <raf@durin42.com> wrote: ># HG changeset patch ># User Augie Fackler <augie@google.com> ># Date 1435699951 14400 ># Tue Jun 30 17:32:31 2015 -0400 ># Node ID ea8c6483a58df0eb0c296abc17cc846be80ed8cb ># Parent 0080461e46716fa9732df990232f968ea3eb6d28 >remotefilelog: introduce new getfile method > >Right now, this is a naive fetch-one-file method. The next change will >mark the method as batchable and use a batch in the client so that >many files can be requested in a single RPC. > >diff --git a/remotefilelog/remotefilelogserver.py >b/remotefilelog/remotefilelogserver.py >--- a/remotefilelog/remotefilelogserver.py >+++ b/remotefilelog/remotefilelogserver.py >@@ -53,6 +53,7 @@ def onetimesetup(ui): > > # support file content requests > wireproto.commands['getfiles'] = (getfiles, '') >+ wireproto.commands['getfile'] = (getfile, 'file node') > > class streamstate(object): > match = None >@@ -155,9 +156,11 @@ def onetimesetup(ui): > def _capabilities(orig, repo, proto): > caps = orig(repo, proto) > if ((shallowrepo.requirement in repo.requirements or >- ui.configbool('remotefilelog', 'server')) >- and isinstance(proto, sshserver.sshserver)): >- caps.append(shallowrepo.requirement) >+ ui.configbool('remotefilelog', 'server'))): >+ if isinstance(proto, sshserver.sshserver): >+ # legacy getfiles method which only works over ssh >+ caps.append(shallowrepo.requirement) >+ caps.append("getfile") By always appending the getfile capability, won't this force even ssh connections to use the new behavior? If that's intentional, do we have numbers on how that affects perf?
> On Jul 16, 2015, at 1:36 PM, Durham Goode <durham@fb.com> wrote: > > > > On 7/16/15, 8:36 AM, "raf@durin42.com" <raf@durin42.com> wrote: > >> # HG changeset patch >> # User Augie Fackler <augie@google.com> >> # Date 1435699951 14400 >> # Tue Jun 30 17:32:31 2015 -0400 >> # Node ID ea8c6483a58df0eb0c296abc17cc846be80ed8cb >> # Parent 0080461e46716fa9732df990232f968ea3eb6d28 >> remotefilelog: introduce new getfile method >> >> Right now, this is a naive fetch-one-file method. The next change will >> mark the method as batchable and use a batch in the client so that >> many files can be requested in a single RPC. > >> >> diff --git a/remotefilelog/remotefilelogserver.py >> b/remotefilelog/remotefilelogserver.py >> --- a/remotefilelog/remotefilelogserver.py >> +++ b/remotefilelog/remotefilelogserver.py >> @@ -53,6 +53,7 @@ def onetimesetup(ui): >> >> # support file content requests >> wireproto.commands['getfiles'] = (getfiles, '') >> + wireproto.commands['getfile'] = (getfile, 'file node') >> >> class streamstate(object): >> match = None >> @@ -155,9 +156,11 @@ def onetimesetup(ui): >> def _capabilities(orig, repo, proto): >> caps = orig(repo, proto) >> if ((shallowrepo.requirement in repo.requirements or >> - ui.configbool('remotefilelog', 'server')) >> - and isinstance(proto, sshserver.sshserver)): >> - caps.append(shallowrepo.requirement) >> + ui.configbool('remotefilelog', 'server'))): >> + if isinstance(proto, sshserver.sshserver): >> + # legacy getfiles method which only works over ssh >> + caps.append(shallowrepo.requirement) >> + caps.append("getfile") > > By always appending the getfile capability, won't this force even ssh > connections to use the new behavior? If that's intentional, do we have > numbers on how that affects perf? I think (though I didn’t recheck just now) that the remotefilelog codepath in the client is still preferred. We could certainly make it that way. I don’t have a good way to get performance numbers for this code - heck, I only just started hacking on it. In *theory*, the request batching should do something reasonable to keep performance from being awful. If that’s not happening in hg, we should fix request batching for ssh IMO.
On 7/16/15, 11:14 AM, "Augie Fackler" <raf@durin42.com> wrote: > >> On Jul 16, 2015, at 1:36 PM, Durham Goode <durham@fb.com> wrote: >> >> >> >> On 7/16/15, 8:36 AM, "raf@durin42.com" <raf@durin42.com> wrote: >> >>> # HG changeset patch >>> # User Augie Fackler <augie@google.com> >>> # Date 1435699951 14400 >>> # Tue Jun 30 17:32:31 2015 -0400 >>> # Node ID ea8c6483a58df0eb0c296abc17cc846be80ed8cb >>> # Parent 0080461e46716fa9732df990232f968ea3eb6d28 >>> remotefilelog: introduce new getfile method >>> >>> Right now, this is a naive fetch-one-file method. The next change will >>> mark the method as batchable and use a batch in the client so that >>> many files can be requested in a single RPC. >> >>> >>> diff --git a/remotefilelog/remotefilelogserver.py >>> b/remotefilelog/remotefilelogserver.py >>> --- a/remotefilelog/remotefilelogserver.py >>> +++ b/remotefilelog/remotefilelogserver.py >>> @@ -53,6 +53,7 @@ def onetimesetup(ui): >>> >>> # support file content requests >>> wireproto.commands['getfiles'] = (getfiles, '') >>> + wireproto.commands['getfile'] = (getfile, 'file node') >>> >>> class streamstate(object): >>> match = None >>> @@ -155,9 +156,11 @@ def onetimesetup(ui): >>> def _capabilities(orig, repo, proto): >>> caps = orig(repo, proto) >>> if ((shallowrepo.requirement in repo.requirements or >>> - ui.configbool('remotefilelog', 'server')) >>> - and isinstance(proto, sshserver.sshserver)): >>> - caps.append(shallowrepo.requirement) >>> + ui.configbool('remotefilelog', 'server'))): >>> + if isinstance(proto, sshserver.sshserver): >>> + # legacy getfiles method which only works over ssh >>> + caps.append(shallowrepo.requirement) >>> + caps.append("getfile") >> >> By always appending the getfile capability, won't this force even ssh >> connections to use the new behavior? If that's intentional, do we have >> numbers on how that affects perf? > >I think (though I didn¹t recheck just now) that the remotefilelog >codepath in the client is still preferred. We could certainly make it >that way. A few lines up in this patch it has: + if remote.capable("getfile"): + _getfilesbatch(... + else: + ... + _getfiles( Which made me think it prefers getfile over getfiles. If we reverse this, I can take this patch without worrying about perf (and I can test the perf later). > >I don¹t have a good way to get performance numbers for this code - heck, >I only just started hacking on it. In *theory*, the request batching >should do something reasonable to keep performance from being awful. If >that¹s not happening in hg, we should fix request batching for ssh IMO. My normal test is to take a large repo (mozilla central might work), mark it as a remotefilelog server, then just do a 'time hg prefetch -r tip' from a client with an empty cache. This first time will be slow (since it builds the blobs), but the second time will be accurate.
> On Jul 16, 2015, at 2:23 PM, Durham Goode <durham@fb.com> wrote: > > > > On 7/16/15, 11:14 AM, "Augie Fackler" <raf@durin42.com> wrote: > >> >>> On Jul 16, 2015, at 1:36 PM, Durham Goode <durham@fb.com> wrote: >>> >>> >>> >>> On 7/16/15, 8:36 AM, "raf@durin42.com" <raf@durin42.com> wrote: >>> >>>> # HG changeset patch >>>> # User Augie Fackler <augie@google.com> >>>> # Date 1435699951 14400 >>>> # Tue Jun 30 17:32:31 2015 -0400 >>>> # Node ID ea8c6483a58df0eb0c296abc17cc846be80ed8cb >>>> # Parent 0080461e46716fa9732df990232f968ea3eb6d28 >>>> remotefilelog: introduce new getfile method >>>> >>>> Right now, this is a naive fetch-one-file method. The next change will >>>> mark the method as batchable and use a batch in the client so that >>>> many files can be requested in a single RPC. >>> >>>> >>>> diff --git a/remotefilelog/remotefilelogserver.py >>>> b/remotefilelog/remotefilelogserver.py >>>> --- a/remotefilelog/remotefilelogserver.py >>>> +++ b/remotefilelog/remotefilelogserver.py >>>> @@ -53,6 +53,7 @@ def onetimesetup(ui): >>>> >>>> # support file content requests >>>> wireproto.commands['getfiles'] = (getfiles, '') >>>> + wireproto.commands['getfile'] = (getfile, 'file node') >>>> >>>> class streamstate(object): >>>> match = None >>>> @@ -155,9 +156,11 @@ def onetimesetup(ui): >>>> def _capabilities(orig, repo, proto): >>>> caps = orig(repo, proto) >>>> if ((shallowrepo.requirement in repo.requirements or >>>> - ui.configbool('remotefilelog', 'server')) >>>> - and isinstance(proto, sshserver.sshserver)): >>>> - caps.append(shallowrepo.requirement) >>>> + ui.configbool('remotefilelog', 'server'))): >>>> + if isinstance(proto, sshserver.sshserver): >>>> + # legacy getfiles method which only works over ssh >>>> + caps.append(shallowrepo.requirement) >>>> + caps.append("getfile") >>> >>> By always appending the getfile capability, won't this force even ssh >>> connections to use the new behavior? If that's intentional, do we have >>> numbers on how that affects perf? >> >> I think (though I didn¹t recheck just now) that the remotefilelog >> codepath in the client is still preferred. We could certainly make it >> that way. > > A few lines up in this patch it has: > > + if remote.capable("getfile"): > + _getfilesbatch(... > + else: > > + ... > + _getfiles( > > Which made me think it prefers getfile over getfiles. If we reverse this, > I can take this patch without worrying about perf (and I can test the perf > later). That works for me. Please feel encouraged to just make that change. (if you’d rather I make the change, let me know and I can take care of it eventually) > >> >> I don¹t have a good way to get performance numbers for this code - heck, >> I only just started hacking on it. In *theory*, the request batching >> should do something reasonable to keep performance from being awful. If >> that¹s not happening in hg, we should fix request batching for ssh IMO. > > My normal test is to take a large repo (mozilla central might work), mark > it as a remotefilelog server, then just do a 'time hg prefetch -r tip' > from a client with an empty cache. This first time will be slow (since it > builds the blobs), but the second time will be accurate.
On 7/16/15, 1:07 PM, "Augie Fackler" <raf@durin42.com> wrote: > >> On Jul 16, 2015, at 2:23 PM, Durham Goode <durham@fb.com> wrote: >> >> >> >> On 7/16/15, 11:14 AM, "Augie Fackler" <raf@durin42.com> wrote: >> >>> >>>> On Jul 16, 2015, at 1:36 PM, Durham Goode <durham@fb.com> wrote: >>>> >>>> >>>> >>>> On 7/16/15, 8:36 AM, "raf@durin42.com" <raf@durin42.com> wrote: >>>> >>>>> # HG changeset patch >>>>> # User Augie Fackler <augie@google.com> >>>>> # Date 1435699951 14400 >>>>> # Tue Jun 30 17:32:31 2015 -0400 >>>>> # Node ID ea8c6483a58df0eb0c296abc17cc846be80ed8cb >>>>> # Parent 0080461e46716fa9732df990232f968ea3eb6d28 >>>>> remotefilelog: introduce new getfile method >>>>> >>>>> Right now, this is a naive fetch-one-file method. The next change >>>>>will >>>>> mark the method as batchable and use a batch in the client so that >>>>> many files can be requested in a single RPC. >>>> >>>>> >>>>> diff --git a/remotefilelog/remotefilelogserver.py >>>>> b/remotefilelog/remotefilelogserver.py >>>>> --- a/remotefilelog/remotefilelogserver.py >>>>> +++ b/remotefilelog/remotefilelogserver.py >>>>> @@ -53,6 +53,7 @@ def onetimesetup(ui): >>>>> >>>>> # support file content requests >>>>> wireproto.commands['getfiles'] = (getfiles, '') >>>>> + wireproto.commands['getfile'] = (getfile, 'file node') >>>>> >>>>> class streamstate(object): >>>>> match = None >>>>> @@ -155,9 +156,11 @@ def onetimesetup(ui): >>>>> def _capabilities(orig, repo, proto): >>>>> caps = orig(repo, proto) >>>>> if ((shallowrepo.requirement in repo.requirements or >>>>> - ui.configbool('remotefilelog', 'server')) >>>>> - and isinstance(proto, sshserver.sshserver)): >>>>> - caps.append(shallowrepo.requirement) >>>>> + ui.configbool('remotefilelog', 'server'))): >>>>> + if isinstance(proto, sshserver.sshserver): >>>>> + # legacy getfiles method which only works over ssh >>>>> + caps.append(shallowrepo.requirement) >>>>> + caps.append("getfile") >>>> >>>> By always appending the getfile capability, won't this force even ssh >>>> connections to use the new behavior? If that's intentional, do we have >>>> numbers on how that affects perf? >>> >>> I think (though I didn¹t recheck just now) that the remotefilelog >>> codepath in the client is still preferred. We could certainly make it >>> that way. >> >> A few lines up in this patch it has: >> >> + if remote.capable("getfile"): >> + _getfilesbatch(... >> + else: >> >> + ... >> + _getfiles( >> >> Which made me think it prefers getfile over getfiles. If we reverse >>this, >> I can take this patch without worrying about perf (and I can test the >>perf >> later). > >That works for me. Please feel encouraged to just make that change. > >(if you’d rather I make the change, let me know and I can take care of it >eventually) I'll make it. Should be pushed along with the rest later today. > >> >>> >>> I don¹t have a good way to get performance numbers for this code - >>>heck, >>> I only just started hacking on it. In *theory*, the request batching >>> should do something reasonable to keep performance from being awful. If >>> that¹s not happening in hg, we should fix request batching for ssh IMO. >> >> My normal test is to take a large repo (mozilla central might work), >>mark >> it as a remotefilelog server, then just do a 'time hg prefetch -r tip' >> from a client with an empty cache. This first time will be slow (since >>it >> builds the blobs), but the second time will be accurate. >
> On Jul 16, 2015, at 4:09 PM, Durham Goode <durham@fb.com> wrote: > >> That works for me. Please feel encouraged to just make that change. >> >> (if you’d rather I make the change, let me know and I can take care of it >> eventually) > > I'll make it. Should be pushed along with the rest later today. Awesome, many thanks!
All three of your series have been pushed. Thanks again On 7/16/15, 1:09 PM, "Augie Fackler" <raf@durin42.com> wrote: > >> On Jul 16, 2015, at 4:09 PM, Durham Goode <durham@fb.com> wrote: >> >>> That works for me. Please feel encouraged to just make that change. >>> >>> (if you¹d rather I make the change, let me know and I can take care of >>>it >>> eventually) >> >> I'll make it. Should be pushed along with the rest later today. > >Awesome, many thanks!
Patch
diff --git a/remotefilelog/__init__.py b/remotefilelog/__init__.py --- a/remotefilelog/__init__.py +++ b/remotefilelog/__init__.py @@ -35,6 +35,8 @@ else: def uisetup(ui): """Wraps user facing Mercurial commands to swap them out with shallow versions. """ + hg.wirepeersetupfuncs.append(fileserverclient.peersetup) + entry = extensions.wrapcommand(commands.table, 'clone', cloneshallow) entry[1].append(('', 'shallow', None, _("create a shallow clone which uses remote file history"))) diff --git a/remotefilelog/fileserverclient.py b/remotefilelog/fileserverclient.py --- a/remotefilelog/fileserverclient.py +++ b/remotefilelog/fileserverclient.py @@ -6,8 +6,8 @@ # GNU General Public License version 2 or any later version. from mercurial.i18n import _ -from mercurial import util, sshpeer, hg, error, util -import os, socket, lz4, time, grp +from mercurial import util, sshpeer, hg, error, util, wireproto +import os, socket, lz4, time, grp, io # Statistics for debugging fetchcost = 0 @@ -34,6 +34,12 @@ def getlocalkey(file, id): pathhash = util.sha1(file).hexdigest() return os.path.join(pathhash, id) +def peersetup(ui, peer): + class remotefilepeer(peer.__class__): + def getfile(self, file, node): + return self._call('getfile', file=file, node=node) + peer.__class__ = remotefilepeer + class cacheconnection(object): """The connection for communicating with the remote cache. Performs gets and sets by communicating with an external process that has the @@ -85,6 +91,14 @@ class cacheconnection(object): return result +def _getfilesbatch(remote, receivemissing, progresstick, missed, idmap): + for m in missed: + file_ = idmap[m] + node = m[-40:] + v = remote.getfile(file_, node) + receivemissing(io.BytesIO('%d\n%s' % (len(v), v)), m) + progresstick() + def _getfiles( remote, receivemissing, fallbackpath, progresstick, missed, idmap): remote._callstream("getfiles") @@ -199,14 +213,18 @@ class fileserverclient(object): raise util.Abort("no remotefilelog server configured - " "is your .hg/hgrc trusted?") remote = hg.peer(self.ui, {}, fallbackpath) - # TODO: deduplicate this with the constant in shallowrepo - if not remote.capable("remotefilelog"): - raise util.Abort("configured remotefilelog server" - " does not support remotefilelog") - if not isinstance(remote, sshpeer.sshpeer): - raise util.Abort('remotefilelog requires ssh servers') - _getfiles(remote, self.receivemissing, fallbackpath, - progresstick, missed, idmap) + if remote.capable("getfile"): + _getfilesbatch( + remote, self.receivemissing, progresstick, missed, idmap) + else: + # TODO: deduplicate this with the constant in shallowrepo + if not remote.capable("remotefilelog"): + raise util.Abort("configured remotefilelog server" + " does not support remotefilelog") + if not isinstance(remote, sshpeer.sshpeer): + raise util.Abort('remotefilelog requires ssh servers') + _getfiles(remote, self.receivemissing, fallbackpath, + progresstick, missed, idmap) finally: self.ui.verbse = verbose # send to memcache diff --git a/remotefilelog/remotefilelogserver.py b/remotefilelog/remotefilelogserver.py --- a/remotefilelog/remotefilelogserver.py +++ b/remotefilelog/remotefilelogserver.py @@ -53,6 +53,7 @@ def onetimesetup(ui): # support file content requests wireproto.commands['getfiles'] = (getfiles, '') + wireproto.commands['getfile'] = (getfile, 'file node') class streamstate(object): match = None @@ -155,9 +156,11 @@ def onetimesetup(ui): def _capabilities(orig, repo, proto): caps = orig(repo, proto) if ((shallowrepo.requirement in repo.requirements or - ui.configbool('remotefilelog', 'server')) - and isinstance(proto, sshserver.sshserver)): - caps.append(shallowrepo.requirement) + ui.configbool('remotefilelog', 'server'))): + if isinstance(proto, sshserver.sshserver): + # legacy getfiles method which only works over ssh + caps.append(shallowrepo.requirement) + caps.append("getfile") return caps wrapfunction(wireproto, '_capabilities', _capabilities) @@ -209,6 +212,17 @@ def _loadfileblob(repo, cachepath, path, text = f.read() return text +def getfile(repo, proto, file, node): + if shallowrepo.requirement in repo.requirements: + raise util.Abort(_('cannot fetch remote files from shallow repo')) + cachepath = repo.ui.config("remotefilelog", "servercachepath") + if not cachepath: + cachepath = os.path.join(repo.path, "remotefilelogcache") + node = bin(node.strip()) + if node == nullid: + return '' + return _loadfileblob(repo, cachepath, file, node) + def getfiles(repo, proto): """A server api for requesting particular versions of particular files. """ diff --git a/tests/test-http.t b/tests/test-http.t --- a/tests/test-http.t +++ b/tests/test-http.t @@ -14,8 +14,7 @@ $ cat hg1.pid >> $DAEMON_PIDS $ hgcloneshallow http://localhost:$HGPORT/ shallow -q - 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over 0.00s - abort: configured remotefilelog server does not support remotefilelog + 1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob) The 'remotefilelog' capability should *not* be exported over http(s), @@ -23,11 +22,11 @@ as the getfile method it offers doesn't $ get-with-headers.py localhost:$HGPORT '?cmd=capabilities' 200 Script output follows - lookup changegroupsubset branchmap pushkey known getbundle unbundlehash batch stream-preferred stream bundle2=HG20%0Achangegroup%3D01%2C02%0Adigests%3Dmd5%2Csha1%2Csha512%0Aerror%3Dabort%2Cunsupportedcontent%2Cpushraced%2Cpushkey%0Ahgtagsfnodes%0Alistkeys%0Apushkey%0Aremote-changegroup%3Dhttp%2Chttps unbundle=HG10GZ,HG10BZ,HG10UN httpheader=1024 (no-eol) + lookup changegroupsubset branchmap pushkey known getbundle unbundlehash batch stream-preferred stream bundle2=HG20%0Achangegroup%3D01%2C02%0Adigests%3Dmd5%2Csha1%2Csha512%0Aerror%3Dabort%2Cunsupportedcontent%2Cpushraced%2Cpushkey%0Ahgtagsfnodes%0Alistkeys%0Apushkey%0Aremote-changegroup%3Dhttp%2Chttps unbundle=HG10GZ,HG10BZ,HG10UN httpheader=1024 getfile (no-eol) $ get-with-headers.py localhost:$HGPORT '?cmd=hello' 200 Script output follows - capabilities: lookup changegroupsubset branchmap pushkey known getbundle unbundlehash batch stream-preferred stream bundle2=HG20%0Achangegroup%3D01%2C02%0Adigests%3Dmd5%2Csha1%2Csha512%0Aerror%3Dabort%2Cunsupportedcontent%2Cpushraced%2Cpushkey%0Ahgtagsfnodes%0Alistkeys%0Apushkey%0Aremote-changegroup%3Dhttp%2Chttps unbundle=HG10GZ,HG10BZ,HG10UN httpheader=1024 + capabilities: lookup changegroupsubset branchmap pushkey known getbundle unbundlehash batch stream-preferred stream bundle2=HG20%0Achangegroup%3D01%2C02%0Adigests%3Dmd5%2Csha1%2Csha512%0Aerror%3Dabort%2Cunsupportedcontent%2Cpushraced%2Cpushkey%0Ahgtagsfnodes%0Alistkeys%0Apushkey%0Aremote-changegroup%3Dhttp%2Chttps unbundle=HG10GZ,HG10BZ,HG10UN httpheader=1024 getfile $ get-with-headers.py localhost:$HGPORT '?cmd=this-command-does-not-exist' | head -n 1 400 no such method: this-command-does-not-exist diff --git a/tests/test-push-pull.t b/tests/test-push-pull.t --- a/tests/test-push-pull.t +++ b/tests/test-push-pull.t @@ -21,10 +21,10 @@ the server supports our custom getfiles $ cd master $ echo 'hello' | hg serve --stdio * (glob) - capabilities: * remotefilelog (glob) + capabilities: lookup * remotefilelog getfile (glob) $ echo 'capabilities' | hg serve --stdio ; echo * (glob) - * remotefilelog (glob) + * remotefilelog getfile (glob) # pull to shallow from full