Patchwork [6,of,6] hgweb: send aggressive Cache-Control header for immutable requests

login
register
mail settings
Submitter Gregory Szorc
Date April 1, 2017, 7:29 a.m.
Message ID <2768152f91137977b8dd.1491031750@ubuntu-vm-main>
Download mbox | patch
Permalink /patch/19891/
State Changes Requested
Headers show

Comments

Gregory Szorc - April 1, 2017, 7:29 a.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1491031522 25200
#      Sat Apr 01 00:25:22 2017 -0700
# Node ID 2768152f91137977b8dd6332f8932a54451f7445
# Parent  a5df20af0f27612eee26f04187bd92b6772a5c7e
hgweb: send aggressive Cache-Control header for immutable requests

Now that we have HTTP requests hitting our "staticimmutable"
web command, which serves content that is guaranteed immutable,
we can start serving aggressive HTTP caching response headers that
tell clients to unconditionally use cached responses without
checking with the server first. This is done in the form of a
Cache-Control header with max-age set to 1 year.

The impact of this change is that web browsers only perform 1 HTTP
request on subsequent page loads (for the URL they are loading). All
static assets (JavaScript, CSS, and images) are serviced from the
browser cache without checking with the server. This avoids extra
round trips and loads pages noticeably faster, even from an HTTP
server on localhost.

The only way to improve the caching from an HTTP perspective is to
add the "immutable" Cache-Control directive. This will prevent
conditional HTTP requests on refresh requests. However, this isn't
yet standardized and only Firefox supports this directive. The
use case for this directive is web sites with content that
periodically updates and users wanting to constantly refresh to
get the latest content. This happens on sites like Facebook. But
it probably isn't relevant to hgweb. So I don't think it is worth
adding "immutable" at this time.

This commit also updates the ETag header to be content based
(a "strong" validator) instead of using the default mtime-based
"weak" validator. In the case of a manual refresh request from a
client with a populated cache, the client will send an If-None-Match
request header and the server will respond with 304 Not Modified if
the ETag values align (which they should since URLs are immutable).
This avoids transferring file content during refreshes (but doesn't
avoid round trips).

Patch

diff --git a/mercurial/hgweb/common.py b/mercurial/hgweb/common.py
--- a/mercurial/hgweb/common.py
+++ b/mercurial/hgweb/common.py
@@ -212,6 +212,27 @@  def staticfile(directory, fname, req, im
             if hash != immutablehash:
                 raise ErrorResponse(HTTP_NOT_FOUND)
 
+            # Hash matches and content is immutable. Do content-based caching.
+
+            # Strip ETag header added by common dispatch code, as it is
+            # using a weak validator (mtime) and isn't as robust as content
+            # hashing.
+            req.headers = [t for t in req.headers if t[0].lower() != 'etag']
+
+            # Conditional HTTP request for this hash. Avoid sending response
+            # body. This likely occurs only after a manual browser refresh,
+            # as clients shouldn't request URLs they've already cached
+            # otherwise.
+            if req.env.get('HTTP_IF_NONE_MATCH') == '"%s"' % hash:
+                raise ErrorResponse(HTTP_NOT_MODIFIED)
+
+            # Set ETag with strong validator (content hash). This allows
+            # refresh requests to be conditional.
+            req.headers.append(('ETag', '"%s"' % hash))
+
+            # Cache unconditionally for 1 year.
+            req.headers.append(('Cache-Control', 'max-age=31536000'))
+
         req.respond(HTTP_OK, ct, body=data)
     except TypeError:
         raise ErrorResponse(HTTP_SERVER_ERROR, 'illegal filename')
diff --git a/tests/test-hgweb.t b/tests/test-hgweb.t
--- a/tests/test-hgweb.t
+++ b/tests/test-hgweb.t
@@ -236,6 +236,21 @@  staticimmutable with a bogus hash should
   
   [1]
 
+staticimmutable for a valid hash should issue Cache-Control and content-based ETag header
+
+  $ get-with-headers.py --headeronly localhost:$HGPORT 'staticimmutable/e8690644d0bb4d35db4a08e469905a0c5ce363b7/hglogo.png' etag cache-control
+  200 Script output follows
+  etag: "e8690644d0bb4d35db4a08e469905a0c5ce363b7"
+  cache-control: max-age=31536000
+
+and If-None-Match should result in 304
+
+  $ get-with-headers.py --twice --headeronly localhost:$HGPORT 'staticimmutable/e8690644d0bb4d35db4a08e469905a0c5ce363b7/hglogo.png' etag cache-control
+  200 Script output follows
+  etag: "e8690644d0bb4d35db4a08e469905a0c5ce363b7"
+  cache-control: max-age=31536000
+  304 Not Modified
+
 should give a 404 - bad revision
 
   $ get-with-headers.py localhost:$HGPORT 'file/spam/foo?style=raw'
@@ -461,7 +476,7 @@  stop and restart
 Test the access/error files are opened in append mode
 
   $ $PYTHON -c "print len(file('access.log').readlines()), 'log lines written'"
-  16 log lines written
+  19 log lines written
 
 static file