Patchwork D2884: wireproto: experimental command to emit file data

login
register
mail settings
Submitter phabricator
Date March 16, 2018, 11:07 p.m.
Message ID <differential-rev-PHID-DREV-756eymiuri7sxm34mlyb-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/29573/
State New
Headers show

Comments

phabricator - March 16, 2018, 11:07 p.m.
indygreg created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Partial clones will require new wire protocol functionality to
  retrieve repository data. The remotefilelog extensions - which
  implements various aspects of partial clone - adds a handful of
  wire protocol commands:
  
  getflogheads
  
    Obtain heads of a filelog
  
  getfile
  
    Obtain data for an individual file revision
  
  getfiles
  
    Batch version of getfile
  
  getpackv1
  
    Obtain a "pack file" containing index and data on multiple
    files
  
  (among others)
  
  Recently, the wire protocol has gained support for "obtain repository
  data" in the form of overloading the "getbundle" wire protocol
  command. This is arguaby OK in the context of "all data is attached
  to bundles" and "bundles are a self-contained representation of
  complete repository data." But partial clone invalidates these
  assumptions because in a partial clone world, we no longer can assume
  things like "the client has all the base revisions."
  
  In a partial clone world, we'll need wire protocol commands that allow
  clients to obtain specific pieces of data with vastly different
  access patterns. For example, a client may want to obtain "index"
  data but keep the fulltext data on the server. Or vice-versa. Or a
  client may wish to fetch all revisions of a specific file but only
  the latest revision of another. These access patterns will be
  difficult to shoehorn into single, powerful commands (like
  "getbundle"). Even if we could, doing that isn't wise from a server
  implementation perspective because it makes implementing scalable
  servers hard. We want server-side commands to be small and simple
  so alternate server implementations can come into existence more
  easily.
  
  This is one reason why the frame-based wire protocol I'm implementing
  supports command pipelining and out-of-order responses. This
  property will enable clients performing complex operations to send
  command streams containing dozens or even hundreds of small command
  requests to servers.
  
  Anyway, this commit implements an experimental wire protocol command
  for "get files data." Essentially, you give it a changeset revision
  you are interested in and it spits back all the files and their data
  in that revision, as fulltexts.
  
  This command is just one way a server could emit data for files.
  A variation of this command that accepts specific file paths and nodes
  whose data is to be retrieved would also be useful. And I imagine we'll
  eventually implement that. It would also be useful to emit index
  data. Or have each file blob be individually compressed. (Right now
  compression is performed on the whole stream because that's how the
  wire protocol currently works - but I have plans to evolve the frame
  based protocol to do new and novel things here.)
  
  I'm not even sure this variation of the wire protocol command is a
  good one to have! One reason I want to start with this command is
  that it seems like a useful primitive. For example, with this
  command, one could build a client that is able to realize a working
  directory from a single wire protocol request: you can literally
  stream the response to this command and turn the data into files on
  the filesystem with minimal stream processing!
  
  As implemented, this command is effectively a benchmark of revlog
  reading and/or compression. On the mozilla-unified repository when
  operating on revision c488b8d0e074efb490ebca32db68eb77871bfd2f (a
  recent revision of mozilla-central, the head of Firefox development),
  my i7-6700K yields the following:
  
  - no compression: 1478MB;  ~94s wall; ~56s CPU
  - zstd level 3:    343MB;  ~97s wall; ~57s CPU
  - zlib level 6:    367MB; ~116s wall; ~74s CPU
  
  For comparison, `hg bundle --base null -r c488b8d0e0 -t zstd-v2`
  (which approximates what `hg clone -r` would be doing on the server)
  yields:
  
    1397MB; ~624s wall; ~225s CPU
  
  Of course, these are vastly different operations. But this does
  demonstrate that if your use case of version control is "check out
  revision X" and you were previously relying on `hg clone` [without
  stream clone bundles] to do that, this wire protocol command
  is overall much more efficient on servers. It's worth noting that
  the use case of version control for many automated systems *is*
  "check out revision X." So I think providing a clone mode that can
  realize a working copy as fast as possible is a worthwhile feature
  to have!

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2884

AFFECTED FILES
  mercurial/configitems.py
  mercurial/help/internals/wireprotocol.txt
  mercurial/wireproto.py
  tests/test-wireproto-revsfiledata.t

CHANGE DETAILS




To: indygreg, #hg-reviewers
Cc: mercurial-devel
phabricator - March 22, 2018, 3:21 p.m.
indygreg planned changes to this revision.
indygreg added a comment.


  This needs rebasing. Please hold off reviewing.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2884

To: indygreg, #hg-reviewers
Cc: mercurial-devel

Patch

diff --git a/tests/test-wireproto-revsfiledata.t b/tests/test-wireproto-revsfiledata.t
new file mode 100644
--- /dev/null
+++ b/tests/test-wireproto-revsfiledata.t
@@ -0,0 +1,244 @@ 
+  $ CMDNAME=exp-revfilesdata-001
+
+  $ cat >> $HGRCPATH << EOF
+  > [server]
+  > compressionengines = none
+  > EOF
+
+  $ hg init server
+  $ cd server
+  $ echo 'foo revision 0' > foo
+  $ hg -q commit -A -m initial
+  $ echo 'foo revision 1' > foo
+  $ echo 'bar 0' > bar
+  $ hg -q commit -A -m second
+  $ chmod +x foo
+  $ hg commit -m third
+
+revfilesdata requires a config options
+
+  $ hg serve -p $HGPORT -d --pid-file hg.pid
+  $ cat hg.pid > $DAEMON_PIDS
+
+  $ hg --verbose debugwireproto --peer raw http://$LOCALIP:$HGPORT << EOF
+  > httprequest GET ?cmd=$CMDNAME
+  >     user-agent: test
+  >     x-hgarg-1: node=irrelevant
+  >     x-hgproto-1: 0.2
+  > EOF
+  using raw connection to peer
+  s>     GET /?cmd=exp-revfilesdata-001 HTTP/1.1\r\n
+  s>     Accept-Encoding: identity\r\n
+  s>     user-agent: test\r\n
+  s>     x-hgarg-1: node=irrelevant\r\n
+  s>     x-hgproto-1: 0.2\r\n
+  s>     host: $LOCALIP:$HGPORT\r\n (glob)
+  s>     \r\n
+  s> makefile('rb', None)
+  s>     HTTP/1.1 200 Script output follows\r\n
+  s>     Server: testing stub value\r\n
+  s>     Date: $HTTP_DATE$\r\n
+  s>     Content-Type: application/hg-error\r\n
+  s>     Content-Length: 49\r\n
+  s>     \r\n
+  s>     revfilesdata wire protocol command is not enabled
+
+  $ cat >> $HGRCPATH << EOF
+  > [experimental]
+  > server.revfilesdata = true
+  > EOF
+
+  $ killdaemons.py
+  $ hg serve -p $HGPORT -d --pid-file hg.pid
+  $ cat hg.pid > $DAEMON_PIDS
+
+Node must be full hash
+
+  $ hg --verbose debugwireproto --peer raw http://$LOCALIP:$HGPORT << EOF
+  > httprequest GET ?cmd=$CMDNAME
+  >     user-agent: test
+  >     x-hgarg-1: node=tip
+  >     x-hgproto-1: 0.2
+  > EOF
+  using raw connection to peer
+  s>     GET /?cmd=exp-revfilesdata-001 HTTP/1.1\r\n
+  s>     Accept-Encoding: identity\r\n
+  s>     user-agent: test\r\n
+  s>     x-hgarg-1: node=tip\r\n
+  s>     x-hgproto-1: 0.2\r\n
+  s>     host: $LOCALIP:$HGPORT\r\n (glob)
+  s>     \r\n
+  s> makefile('rb', None)
+  s>     HTTP/1.1 200 Script output follows\r\n
+  s>     Server: testing stub value\r\n
+  s>     Date: $HTTP_DATE$\r\n
+  s>     Content-Type: application/hg-error\r\n
+  s>     Content-Length: 31\r\n
+  s>     \r\n
+  s>     nodes argument must be 40 bytes
+
+And it must be a known hash
+
+  $ hg --verbose debugwireproto --peer raw http://$LOCALIP:$HGPORT << EOF
+  > httprequest GET ?cmd=$CMDNAME
+  >     user-agent: test
+  >     x-hgarg-1: node=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+  >     x-hgproto-1: 0.2
+  > EOF
+  using raw connection to peer
+  s>     GET /?cmd=exp-revfilesdata-001 HTTP/1.1\r\n
+  s>     Accept-Encoding: identity\r\n
+  s>     user-agent: test\r\n
+  s>     x-hgarg-1: node=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\r\n
+  s>     x-hgproto-1: 0.2\r\n
+  s>     host: $LOCALIP:$HGPORT\r\n (glob)
+  s>     \r\n
+  s> makefile('rb', None)
+  s>     HTTP/1.1 200 Script output follows\r\n
+  s>     Server: testing stub value\r\n
+  s>     Date: $HTTP_DATE$\r\n
+  s>     Content-Type: application/hg-error\r\n
+  s>     Content-Length: 54\r\n
+  s>     \r\n
+  s>     unknown node: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+
+Request for revision with single file
+
+  $ hg --verbose debugwireproto --peer raw http://$LOCALIP:$HGPORT << EOF
+  > httprequest GET ?cmd=$CMDNAME
+  >     user-agent: test
+  >     x-hgarg-1: node=a64d23ad96a87844da3723df73c209a1c5507999
+  >     x-hgproto-1: 0.2
+  > EOF
+  using raw connection to peer
+  s>     GET /?cmd=exp-revfilesdata-001 HTTP/1.1\r\n
+  s>     Accept-Encoding: identity\r\n
+  s>     user-agent: test\r\n
+  s>     x-hgarg-1: node=a64d23ad96a87844da3723df73c209a1c5507999\r\n
+  s>     x-hgproto-1: 0.2\r\n
+  s>     host: $LOCALIP:$HGPORT\r\n (glob)
+  s>     \r\n
+  s> makefile('rb', None)
+  s>     HTTP/1.1 200 Script output follows\r\n
+  s>     Server: testing stub value\r\n
+  s>     Date: $HTTP_DATE$\r\n
+  s>     Content-Type: application/mercurial-0.2\r\n
+  s>     Transfer-Encoding: chunked\r\n
+  s>     \r\n
+  s>     1\r\n
+  s>     \x04
+  s>     \r\n
+  s>     4\r\n
+  s>     none
+  s>     \r\n
+  s>     1f\r\n
+  s>     F\x92\xc6\xd5/y\x90\xcce\x0c\xea\x80\xd0\xca\xe1\xde6\xb5wX\x03\x00\x0f\x00\x00\x00\x00\x00\x00\x00\x00
+  s>     \r\n
+  s>     3\r\n
+  s>     foo
+  s>     \r\n
+  s>     f\r\n
+  s>     foo revision 0\n
+  s>     \r\n
+  s>     0\r\n
+  s>     \r\n
+
+Revision with multiple files
+
+  $ hg --verbose debugwireproto --peer raw http://$LOCALIP:$HGPORT << EOF
+  > httprequest GET ?cmd=$CMDNAME
+  >     user-agent: test
+  >     x-hgarg-1: node=bc56cef01319bf181be2886f8a3aefea9a33bfdb
+  >     x-hgproto-1: 0.2
+  > EOF
+  using raw connection to peer
+  s>     GET /?cmd=exp-revfilesdata-001 HTTP/1.1\r\n
+  s>     Accept-Encoding: identity\r\n
+  s>     user-agent: test\r\n
+  s>     x-hgarg-1: node=bc56cef01319bf181be2886f8a3aefea9a33bfdb\r\n
+  s>     x-hgproto-1: 0.2\r\n
+  s>     host: $LOCALIP:$HGPORT\r\n (glob)
+  s>     \r\n
+  s> makefile('rb', None)
+  s>     HTTP/1.1 200 Script output follows\r\n
+  s>     Server: testing stub value\r\n
+  s>     Date: $HTTP_DATE$\r\n
+  s>     Content-Type: application/mercurial-0.2\r\n
+  s>     Transfer-Encoding: chunked\r\n
+  s>     \r\n
+  s>     1\r\n
+  s>     \x04
+  s>     \r\n
+  s>     4\r\n
+  s>     none
+  s>     \r\n
+  s>     1f\r\n
+  s>     \xdb&\xb9\xed\xe1\xcc\xd5]\xdact\xb01\x14h\xda\xe3\xc2\xe2\xd9\x03\x00\x06\x00\x00\x00\x00\x00\x00\x00\x00
+  s>     \r\n
+  s>     3\r\n
+  s>     bar
+  s>     \r\n
+  s>     6\r\n
+  s>     bar 0\n
+  s>     \r\n
+  s>     1f\r\n
+  s>     $\x95\x1c\xb3\x8e(\xc6>\xf8\x0cx\\\x88G\xbd\xd3[\x08\x13c\x03\x00\x0f\x00\x00\x00\x00\x00\x00\x00\x00
+  s>     \r\n
+  s>     3\r\n
+  s>     foo
+  s>     \r\n
+  s>     f\r\n
+  s>     foo revision 1\n
+  s>     \r\n
+  s>     0\r\n
+  s>     \r\n
+
+And with the executable bit set
+
+  $ hg --verbose debugwireproto --peer raw http://$LOCALIP:$HGPORT << EOF
+  > httprequest GET ?cmd=$CMDNAME
+  >     user-agent: test
+  >     x-hgarg-1: node=328fdcd53a5d2f0dd58397e1f1ed73d5913332fe
+  >     x-hgproto-1: 0.2
+  > EOF
+  using raw connection to peer
+  s>     GET /?cmd=exp-revfilesdata-001 HTTP/1.1\r\n
+  s>     Accept-Encoding: identity\r\n
+  s>     user-agent: test\r\n
+  s>     x-hgarg-1: node=328fdcd53a5d2f0dd58397e1f1ed73d5913332fe\r\n
+  s>     x-hgproto-1: 0.2\r\n
+  s>     host: $LOCALIP:$HGPORT\r\n (glob)
+  s>     \r\n
+  s> makefile('rb', None)
+  s>     HTTP/1.1 200 Script output follows\r\n
+  s>     Server: testing stub value\r\n
+  s>     Date: $HTTP_DATE$\r\n
+  s>     Content-Type: application/mercurial-0.2\r\n
+  s>     Transfer-Encoding: chunked\r\n
+  s>     \r\n
+  s>     1\r\n
+  s>     \x04
+  s>     \r\n
+  s>     4\r\n
+  s>     none
+  s>     \r\n
+  s>     1f\r\n
+  s>     \xdb&\xb9\xed\xe1\xcc\xd5]\xdact\xb01\x14h\xda\xe3\xc2\xe2\xd9\x03\x00\x06\x00\x00\x00\x00\x00\x00\x00\x00
+  s>     \r\n
+  s>     3\r\n
+  s>     bar
+  s>     \r\n
+  s>     6\r\n
+  s>     bar 0\n
+  s>     \r\n
+  s>     1f\r\n
+  s>     $\x95\x1c\xb3\x8e(\xc6>\xf8\x0cx\\\x88G\xbd\xd3[\x08\x13c\x03\x00\x0f\x00\x00\x00\x00\x00\x00\x00\x01
+  s>     \r\n
+  s>     3\r\n
+  s>     foo
+  s>     \r\n
+  s>     f\r\n
+  s>     foo revision 1\n
+  s>     \r\n
+  s>     0\r\n
+  s>     \r\n
diff --git a/mercurial/wireproto.py b/mercurial/wireproto.py
--- a/mercurial/wireproto.py
+++ b/mercurial/wireproto.py
@@ -9,6 +9,7 @@ 
 
 import hashlib
 import os
+import struct
 import tempfile
 
 from .i18n import _
@@ -1132,3 +1133,59 @@ 
                 bundler.newpart('error:pushraced',
                                 [('message', util.forcebytestr(exc))])
             return streamres_legacy(gen=bundler.getchunks())
+
+@wireprotocommand('exp-revfilesdata-001', 'node',
+                  permission='pull')
+def revfilesdata(repo, proto, node):
+    """Obtain file data for a particular revision.
+
+    Given a node, emit metadata about files in that revision and their data.
+
+    TODO support receiving a narrow spec, integrating with a matcher.
+    TODO only expose to transport version 2
+    """
+    if not repo.ui.configbool('experimental', 'server.revfilesdata'):
+        return wireprototypes.ooberror(_('revfilesdata wire protocol command '
+                                         'is not enabled'))
+
+    if len(node) != 40:
+        return wireprototypes.ooberror(_('nodes argument must be 40 bytes'))
+
+    try:
+        ctx = repo[bin(node)]
+    except error.RepoLookupError:
+        return wireprototypes.ooberror(_('unknown node: %s') % node)
+
+    pathflags = {}
+
+    def makeentries():
+        for (path, node, flags) in ctx.manifest().iterentries():
+            pathflags[path] = flags
+            yield path, node
+
+    results = repo.filesstore.resolvefilesdata(makeentries())
+
+    # Output consists of structs followed by raw data.
+    s = struct.Struct(r'<20sHQB')
+
+    def emitdata():
+        for result, path, node, data in results:
+            flags = pathflags[path]
+            del pathflags[path]
+
+            if result == 'ok':
+                rawflag = 0
+                if b'x' in flags:
+                    rawflag |= 1
+                if b'l' in flags:
+                    rawflag |= 2
+
+                yield s.pack(node, len(path), len(data), rawflag)
+                yield path
+                yield data
+
+            else:
+                raise error.ProgrammingError('do not yet handle %s results' %
+                                             result)
+
+    return wireprototypes.streamres(emitdata())
diff --git a/mercurial/help/internals/wireprotocol.txt b/mercurial/help/internals/wireprotocol.txt
--- a/mercurial/help/internals/wireprotocol.txt
+++ b/mercurial/help/internals/wireprotocol.txt
@@ -1247,6 +1247,37 @@ 
 
 The return type is a ``string``.
 
+exp-revsfilesdata-001
+---------------------
+
+**(Experimental and subject to behavior changes)**
+
+This command allows obtaining the fulltext of files data for a specific
+revision.
+
+The ``node`` argument defines the revision whose file data is to
+be retrieved.
+
+The response is a stream consisting of a series of files data records.
+Each record begins with a 31 byte struct. The struct contains:
+
+* 20 bytes file node.
+* 16-bit unsigned little-endian integer defining the size of the file
+  name.
+* 64-bit unsigned little-endian integer defining the size of the file
+  data.
+* 1 byte containing file flags.
+
+The file flags byte has the ``0x01`` bit set if the file is executable.
+The ``0x02`` bit is set if the file is a symlink. If a symlink, the raw
+file data refers to the target of the symlink.
+
+Following that struct is the raw filename of the file. This is a raw
+byte string and has no encoding (Mercurial stores filenames as binary
+byte sequences). Following the filename is the raw file data.
+Following the raw file data is the next file record struct, or end of
+stream.
+
 getbundle
 ---------
 
diff --git a/mercurial/configitems.py b/mercurial/configitems.py
--- a/mercurial/configitems.py
+++ b/mercurial/configitems.py
@@ -574,6 +574,9 @@ 
 coreconfigitem('experimental', 'update.atomic-file',
     default=False,
 )
+coreconfigitem('experimental', 'server.revfilesdata',
+    default=False,
+)
 coreconfigitem('experimental', 'sshpeer.advertise-v2',
     default=False,
 )