Patchwork D2740: wireprotoserver: move all wire protocol handling logic out of hgweb

login
register
mail settings
Submitter phabricator
Date March 9, 2018, 1:06 a.m.
Message ID <differential-rev-PHID-DREV-ei2mtmp6btyergyn44ay-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/29146/
State Superseded
Headers show

Comments

phabricator - March 9, 2018, 1:06 a.m.
indygreg created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Previous patches from several days ago worked to isolate processing
  of HTTP wire protocol requests to wireprotoserver. We still had a
  little logic in hgweb. If feels like the right time to finish the
  job.
  
  This commit moves WSGI request servicing from hgweb to wireprotoserver.
  The ugly dict holding the parsed request is no more. I think the new
  code is cleaner.
  
  As part of this, we now process wire protocol requests before the
  block to obtain the "query" variable. This makes it clear that this
  wonky "query" variable is not used by the wire protocol.
  
  The wonkiest part about this code is the HTTP 404. I'm actually not
  sure what all is going on here. It looks like the code is trying to
  prevent URL with path components that specify a command from not
  working. That part I grok. What I don't grok is why we need to send
  a 404. I would think it would be OK to no-op and let another handler
  try to service the request. But if we do this, we get some subrepo
  test failures. So it looks like something is expecting the HTTP 404
  and reacting to it in a specific way. It /might/ be possible to
  change the behavior here. But it isn't something I'm comfortable
  doing because I don't understand the problem space.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2740

AFFECTED FILES
  mercurial/hgweb/hgweb_mod.py
  mercurial/wireprotoserver.py

CHANGE DETAILS




To: indygreg, #hg-reviewers
Cc: mercurial-devel

Patch

diff --git a/mercurial/wireprotoserver.py b/mercurial/wireprotoserver.py
--- a/mercurial/wireprotoserver.py
+++ b/mercurial/wireprotoserver.py
@@ -150,24 +150,29 @@ 
 def iscmd(cmd):
     return cmd in wireproto.commands
 
-def parsehttprequest(rctx, wsgireq, req, checkperm):
-    """Parse the HTTP request for a wire protocol request.
+def handlewsgirequest(rctx, wsgireq, req, checkperm):
+    """Possibly process a wire protocol request.
 
-    If the current request appears to be a wire protocol request, this
-    function returns a dict with details about that request, including
-    an ``abstractprotocolserver`` instance suitable for handling the
-    request. Otherwise, ``None`` is returned.
+    If the current request is a wire protocol request, the request is
+    processed by this function.
 
     ``wsgireq`` is a ``wsgirequest`` instance.
     ``req`` is a ``parsedrequest`` instance.
+
+    Returns a 2-tuple of (bool, response) where the 1st element indicates
+    whether the request was handled and the 2nd element is a return
+    value for a WSGI application (often a generator of bytes).
     """
+    # Avoid cycle involving hg module.
+    from .hgweb import common as hgwebcommon
+
     repo = rctx.repo
 
     # HTTP version 1 wire protocol requests are denoted by a "cmd" query
     # string parameter. If it isn't present, this isn't a wire protocol
     # request.
     if 'cmd' not in req.querystringdict:
-        return None
+        return False, None
 
     cmd = req.querystringdict['cmd'][0]
 
@@ -179,17 +184,32 @@ 
     # known wire protocol commands and it is less confusing for machine
     # clients.
     if not iscmd(cmd):
-        return None
+        return False, None
+
+    # The "cmd" query string argument is only valid on the root path of the
+    # repo. e.g. ``/?cmd=foo``, ``/repo?cmd=foo``. URL paths within the repo
+    # like ``/blah?cmd=foo`` are not allowed. So don't recognize the request
+    # in this case. We send an HTTP 404 for backwards compatibility reasons.
+    if req.dispatchpath:
+        res = _handlehttperror(
+            hgwebcommon.ErrorResponse(hgwebcommon.HTTP_NOT_FOUND), wsgireq,
+            cmd)
+
+        return True, res
 
     proto = httpv1protocolhandler(wsgireq, repo.ui,
                                   lambda perm: checkperm(rctx, wsgireq, perm))
 
-    return {
-        'cmd': cmd,
-        'proto': proto,
-        'dispatch': lambda: _callhttp(repo, wsgireq, proto, cmd),
-        'handleerror': lambda ex: _handlehttperror(ex, wsgireq, cmd),
-    }
+    # The permissions checker should be the only thing that can raise an
+    # ErrorResponse. It is kind of a layer violation to catch an hgweb
+    # exception here. So consider refactoring into a exception type that
+    # is associated with the wire protocol.
+    try:
+        res = _callhttp(repo, wsgireq, proto, cmd)
+    except hgwebcommon.ErrorResponse as e:
+        res = _handlehttperror(e, wsgireq, cmd)
+
+    return True, res
 
 def _httpresponsetype(ui, wsgireq, prefer_uncompressed):
     """Determine the appropriate response type and compression settings.
diff --git a/mercurial/hgweb/hgweb_mod.py b/mercurial/hgweb/hgweb_mod.py
--- a/mercurial/hgweb/hgweb_mod.py
+++ b/mercurial/hgweb/hgweb_mod.py
@@ -318,25 +318,16 @@ 
                                if h[0] != 'Content-Security-Policy']
             wsgireq.headers.append(('Content-Security-Policy', rctx.csp))
 
+        handled, res = wireprotoserver.handlewsgirequest(
+            rctx, wsgireq, req, self.check_perm)
+        if handled:
+            return res
+
         if req.havepathinfo:
             query = req.dispatchpath
         else:
             query = req.querystring.partition('&')[0].partition(';')[0]
 
-        # Route it to a wire protocol handler if it looks like a wire protocol
-        # request.
-        protohandler = wireprotoserver.parsehttprequest(rctx, wsgireq, req,
-                                                        self.check_perm)
-
-        if protohandler:
-            try:
-                if query:
-                    raise ErrorResponse(HTTP_NOT_FOUND)
-
-                return protohandler['dispatch']()
-            except ErrorResponse as inst:
-                return protohandler['handleerror'](inst)
-
         # translate user-visible url structure to internal structure
 
         args = query.split('/', 2)