Patchwork D2836: wireproto: define permissions-based routing of HTTPv2 wire protocol

login
register
mail settings
Submitter phabricator
Date March 13, 2018, 2:16 a.m.
Message ID <differential-rev-PHID-DREV-35spdd3eah43d7e4swfb-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/29427/
State Superseded
Headers show

Comments

phabricator - March 13, 2018, 2:16 a.m.
indygreg created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Now that we have a scaffolding for serving version 2 of the HTTP
  protocol, let's start implementing it.
  
  A good place to start is URL routing and basic request processing
  semantics. We can focus on content types, capabilities detect, etc
  later.
  
  Version 2 of the HTTP wire protocol encodes the needed permissions
  of the request in the URL path. The reasons for this are documented
  in the added documentation. In short, a) it makes it really easy and
  fail proof for server administrators to implement path-based
  authentication and b) it will enable clients to realize very early in
  a server exchange that authentication will be required to complete
  the operation. This latter point avoids all kinds of complexity and
  problems, like dealing with Expect: 100-continue and clients finding
  out later during `hg push` that they need to provide authentication.
  This will avoid the current badness where clients send a full bundle,
  get an HTTP 403, provide authentication, then retransmit the bundle.
  
  In order to implement command checking, we needed to implement a
  protocol handler for the new wire protocol. Our handler is just
  small enough to run the code we've implemented.
  
  Tests for the defined functionality have been added.
  
  I very much want to refactor the permissions checking code and define
  a better response format. But this can be done later. Nothing is
  covered by backwards compatibility at this point.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2836

AFFECTED FILES
  mercurial/help/internals/wireprotocol.txt
  mercurial/wireprotoserver.py
  tests/test-http-api-httpv2.t

CHANGE DETAILS




To: indygreg, #hg-reviewers
Cc: mercurial-devel
phabricator - March 19, 2018, 11:38 p.m.
indygreg planned changes to this revision.
indygreg added a comment.


  I'm going to refactor tests for this to deal with an issue I had testing things on my MBP.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2836

To: indygreg, #hg-reviewers
Cc: mercurial-devel
phabricator - March 23, 2018, 3:35 p.m.
yuja added inline comments.

INLINE COMMENTS

> wireprotoserver.py:284
> +        res.headers[b'Content-Type'] = b'text/plain'
> +        res.setbodybytes(_('HTTP version 2 API handler'))
> +        return

Just a nit: No `_()` is needed because we don't know client language nor character encoding.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2836

To: indygreg, #hg-reviewers, durin42
Cc: yuja, mercurial-devel

Patch

diff --git a/tests/test-http-api-httpv2.t b/tests/test-http-api-httpv2.t
--- a/tests/test-http-api-httpv2.t
+++ b/tests/test-http-api-httpv2.t
@@ -1,3 +1,5 @@ 
+  $ HTTPV2=exp-http-v2-0001
+
   $ hg init server
   $ cat > server/.hg/hgrc << EOF
   > [experimental]
@@ -8,7 +10,7 @@ 
 
 HTTP v2 protocol not enabled by default
 
-  $ get-with-headers.py $LOCALIP:$HGPORT api/exp-http-v2-0001 -
+  $ get-with-headers.py $LOCALIP:$HGPORT api/$HTTPV2 -
   404 Not Found
   content-length: 33
   content-type: text/plain
@@ -28,11 +30,100 @@ 
   $ hg -R server serve -p $HGPORT -d --pid-file hg.pid
   $ cat hg.pid > $DAEMON_PIDS
 
-Requests simply echo their path (for now)
+Request to read-only command works out of the box
 
-  $ get-with-headers.py $LOCALIP:$HGPORT api/exp-http-v2-0001/path1/path2 -
+  $ get-with-headers.py $LOCALIP:$HGPORT api/$HTTPV2/ro/known -
   200 OK
-  content-length: 12
+  content-length: 9
+  content-type: text/plain
+  
+  ro/known/ (no-eol)
+
+Request to unknown command yields 404
+
+  $ get-with-headers.py $LOCALIP:$HGPORT api/$HTTPV2/ro/badcommand -
+  404 Not Found
+  content-length: 42
   content-type: text/plain
   
-  path1/path2/ (no-eol)
+  unknown wire protocol command: badcommand
+  [1]
+
+Request to read-write command fails because server is read-only by default
+
+GET request not allowed
+
+  $ get-with-headers.py $LOCALIP:$HGPORT api/$HTTPV2/rw/known -
+  405 push requires POST request
+  content-length: 17
+  
+  permission denied (no-eol)
+  [1]
+
+Even for unknown commands
+
+  $ get-with-headers.py $LOCALIP:$HGPORT api/$HTTPV2/rw/badcommand -
+  405 push requires POST request
+  content-length: 17
+  
+  permission denied (no-eol)
+  [1]
+
+
+SSL required by default
+
+  $ get-with-headers.py --method POST $LOCALIP:$HGPORT api/$HTTPV2/rw/known -
+  403 ssl required
+  content-length: 17
+  
+  permission denied (no-eol)
+  [1]
+
+Restart server to allow non-ssl read-write operations
+
+  $ killdaemons.py
+  $ cat > server/.hg/hgrc << EOF
+  > [experimental]
+  > web.apiserver = true
+  > web.api.http-v2 = true
+  > [web]
+  > push_ssl = false
+  > EOF
+
+  $ hg -R server serve -p $HGPORT -d --pid-file hg.pid
+  $ cat hg.pid > $DAEMON_PIDS
+
+  $ get-with-headers.py --method POST $LOCALIP:$HGPORT api/$HTTPV2/rw/known -
+  401 push not authorized
+  content-length: 17
+  
+  permission denied (no-eol)
+  [1]
+
+  $ killdaemons.py
+  $ cat > server/.hg/hgrc << EOF
+  > [experimental]
+  > web.apiserver = true
+  > web.api.http-v2 = true
+  > [web]
+  > push_ssl = false
+  > allow-push = *
+  > EOF
+
+  $ hg -R server serve -p $HGPORT -d --pid-file hg.pid
+  $ cat hg.pid > $DAEMON_PIDS
+
+  $ get-with-headers.py --method POST $LOCALIP:$HGPORT api/$HTTPV2/rw/known -
+  200 OK
+  content-length: 9
+  content-type: text/plain
+  
+  rw/known/ (no-eol)
+
+  $ get-with-headers.py --method POST $LOCALIP:$HGPORT api/$HTTPV2/rw/badcommand -
+  404 Not Found
+  content-length: 42
+  content-type: text/plain
+  
+  unknown wire protocol command: badcommand
+  [1]
diff --git a/mercurial/wireprotoserver.py b/mercurial/wireprotoserver.py
--- a/mercurial/wireprotoserver.py
+++ b/mercurial/wireprotoserver.py
@@ -272,6 +272,64 @@ 
                                    req.dispatchparts[2:])
 
 def _handlehttpv2request(rctx, req, res, checkperm, urlparts):
+    from .hgweb import common as hgwebcommon
+
+    # URL space looks like: <permissions>/<command>, where <permission> can
+    # be ``ro`` or ``rw`` to signal read-only or read-write, respectively.
+
+    # Root URL does nothing meaningful... yet.
+    if not urlparts:
+        res.status = b'200 OK'
+        res.headers[b'Content-Type'] = b'text/plain'
+        res.setbodybytes(_('HTTP version 2 API handler'))
+        return
+
+    if len(urlparts) == 1:
+        res.status = b'404 Not Found'
+        res.headers[b'Content-Type'] = b'text/plain'
+        res.setbodybytes(_('do not know how to process %s\n') %
+                         req.dispatchpath)
+        return
+
+    permission, command = urlparts[0:2]
+
+    if permission not in (b'ro', b'rw'):
+        res.status = b'404 Not Found'
+        res.headers[b'Content-Type'] = b'text/plain'
+        res.setbodybytes(_('unknown permission: %s') % permission)
+        return
+
+    # At some point we'll want to use our own API instead of recycling the
+    # behavior of version 1 of the wire protocol...
+    # TODO return reasonable responses - not responses that overload the
+    # HTTP status line message for error reporting.
+    try:
+        checkperm(rctx, req, 'pull' if permission == b'ro' else 'push')
+    except hgwebcommon.ErrorResponse as e:
+        res.status = hgwebcommon.statusmessage(e.code, pycompat.bytestr(e))
+        for k, v in e.headers:
+            res.headers[k] = v
+        res.setbodybytes('permission denied')
+        return
+
+    if command not in wireproto.commands:
+        res.status = b'404 Not Found'
+        res.headers[b'Content-Type'] = b'text/plain'
+        res.setbodybytes(_('unknown wire protocol command: %s\n') % command)
+        return
+
+    repo = rctx.repo
+    ui = repo.ui
+
+    proto = httpv2protocolhandler(req, ui)
+
+    if not wireproto.commands.commandavailable(command, proto):
+        res.status = b'404 Not Found'
+        res.headers[b'Content-Type'] = b'text/plain'
+        res.setbodybytes(_('invalid wire protocol command: %s') % command)
+        return
+
+    # We don't do anything meaningful yet.
     res.status = b'200 OK'
     res.headers[b'Content-Type'] = b'text/plain'
     res.setbodybytes(b'/'.join(urlparts) + b'/')
@@ -284,6 +342,34 @@ 
     },
 }
 
+class httpv2protocolhandler(wireprototypes.baseprotocolhandler):
+    def __init__(self, req, ui):
+        self._req = req
+        self._ui = ui
+
+    @property
+    def name(self):
+        return HTTPV2
+
+    def getargs(self, args):
+        raise NotImplementedError
+
+    def forwardpayload(self, fp):
+        raise NotImplementedError
+
+    @contextlib.contextmanager
+    def mayberedirectstdio(self):
+        raise NotImplementedError
+
+    def client(self):
+        raise NotImplementedError
+
+    def addcapabilities(self, repo, caps):
+        raise NotImplementedError
+
+    def checkperm(self, perm):
+        raise NotImplementedError
+
 def _httpresponsetype(ui, req, prefer_uncompressed):
     """Determine the appropriate response type and compression settings.
 
diff --git a/mercurial/help/internals/wireprotocol.txt b/mercurial/help/internals/wireprotocol.txt
--- a/mercurial/help/internals/wireprotocol.txt
+++ b/mercurial/help/internals/wireprotocol.txt
@@ -144,6 +144,46 @@ 
 ``application/mercurial-0.*`` media type and the HTTP response is typically
 using *chunked transfer* (``Transfer-Encoding: chunked``).
 
+HTTP Version 2 Transport
+------------------------
+
+**Experimental - feature under active development**
+
+Version 2 of the HTTP protocol is exposed under the ``/api/*`` URL space.
+It's final API name is not yet formalized.
+
+Commands are triggered by sending HTTP requests against URLs of the
+form ``<permission>/<command>``, where ``<permission>`` is ``ro`` or
+``rw``, meaning read-only and read-write, respectively and ``<command>``
+is a named wire protocol command.
+
+Commands that modify repository state in meaningful ways MUST NOT be
+exposed under the ``ro`` URL prefix. All available commands MUST be
+available under the ``rw`` URL prefix.
+
+Server adminstrators MAY implement blanket HTTP authentication keyed
+off the URL prefix. For example, a server may require authentication
+for all ``rw/*`` URLs and let unauthenticated requests to ``ro/*``
+URL proceed. A server MAY issue an HTTP 401, 403, or 407 response
+in accordance with RFC 7235. Clients SHOULD recognize the HTTP Basic
+(RFC 7617) and Digest (RFC 7616) authentication schemes. Clients SHOULD
+make an attempt to recognize unknown schemes using the
+``WWW-Authenticate`` response header on a 401 response, as defined by
+RFC 7235.
+
+Read-only commands are accessible under ``rw/*`` URLs so clients can
+signal the intent of the operation very early in the connection
+lifecycle. For example, a ``push`` operation - which consists of
+various read-only commands mixed with at least one read-write command -
+can perform all commands against ``rw/*`` URLs so that any server-side
+authentication requirements are discovered upon attempting the first
+command - not potentially several commands into the exchange. This
+allows clients to fail faster or prompt for credentials as soon as the
+exchange takes place. This provides a better end-user experience.
+
+Requests to unknown commands or URLS result in an HTTP 404.
+TODO formally define response type, how error is communicated, etc.
+
 SSH Protocol
 ============