Patchwork [2,of,2] server: support absolute urls (issue4877)

login
register
mail settings
Submitter timeless@mozdev.org
Date Oct. 2, 2015, 8:19 p.m.
Message ID <3afa079854db86531312.1443817148@waste.org>
Download mbox | patch
Permalink /patch/10741/
State Changes Requested
Delegated to: Augie Fackler
Headers show

Comments

timeless@mozdev.org - Oct. 2, 2015, 8:19 p.m.
# HG changeset patch
# User timeless@mozdev.org
# Date 1443813482 14400
#      Fri Oct 02 15:18:02 2015 -0400
# Node ID 3afa079854db8653131291e10db34cf3bc440e35
# Parent  8bd5471f48ec17944c641e3b7fbb8a2d9a8f1865
server: support absolute urls (issue4877)

rfc2616 sec 5.1.2 says they should work;
tests on OS X 10.6 / Python 2.6 fail because
hg+tinyproxy generate these urls.

curl with tinyproxy:
"GET /?cmd=capabilities HTTP/1.1" 200 -
hg with tinyproxy:
"GET https://localhost:.../?cmd=capabilities HTTP/1.1" 404 -
Pierre-Yves David - Oct. 2, 2015, 10:59 p.m.
On 10/02/2015 01:19 PM, timeless@mozdev.org wrote:
> # HG changeset patch
> # User timeless@mozdev.org
> # Date 1443813482 14400
> #      Fri Oct 02 15:18:02 2015 -0400
> # Node ID 3afa079854db8653131291e10db34cf3bc440e35
> # Parent  8bd5471f48ec17944c641e3b7fbb8a2d9a8f1865
> server: support absolute urls (issue4877)

Sounds legit and valid to me. I'll let our local HTTP weir^W guru look 
at them.

Should this go on Stable?
Yuya Nishihara - Oct. 4, 2015, 4:05 a.m.
On Fri, 02 Oct 2015 15:19:08 -0500, timeless@mozdev.org wrote:
> # HG changeset patch
> # User timeless@mozdev.org
> # Date 1443813482 14400
> #      Fri Oct 02 15:18:02 2015 -0400
> # Node ID 3afa079854db8653131291e10db34cf3bc440e35
> # Parent  8bd5471f48ec17944c641e3b7fbb8a2d9a8f1865
> server: support absolute urls (issue4877)
> 
> rfc2616 sec 5.1.2 says they should work;
> tests on OS X 10.6 / Python 2.6 fail because
> hg+tinyproxy generate these urls.
> 
> curl with tinyproxy:
> "GET /?cmd=capabilities HTTP/1.1" 200 -
> hg with tinyproxy:
> "GET https://localhost:.../?cmd=capabilities HTTP/1.1" 404 -
> 
> diff --git a/mercurial/hgweb/server.py b/mercurial/hgweb/server.py
> --- a/mercurial/hgweb/server.py
> +++ b/mercurial/hgweb/server.py
> @@ -7,6 +7,7 @@
>  # GNU General Public License version 2 or any later version.
>  
>  import os, sys, errno, urllib, BaseHTTPServer, socket, SocketServer, traceback
> +from urlparse import urlparse
>  from mercurial import util, error
>  from mercurial.hgweb import common
>  from mercurial.i18n import _
> @@ -90,9 +91,26 @@
>          self.do_POST()
>  
>      def do_hgweb(self):
> +        env = {}
> +        parsed = urlparse(self.path)
> +        if parsed.scheme and parsed.netloc:
> +            # http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1.2
> +            # To allow for transition to absoluteURIs in all requests in
> +            # future versions of HTTP, all HTTP/1.1 servers MUST accept the
> +            # absoluteURI form in requests, even though HTTP/1.1 clients will
> +            # only generate them in requests to proxies.

Just FYI, RFC7230 also says that, so this requirement seems still valid.

https://tools.ietf.org/html/rfc7230#section-5.3.2

> +            server = parsed.netloc
> +            self.path = self.path[self.path.find(server) + len(server):]

If the absolute URI is "http://http/foo", path will be "://http/foo".
Gregory Szorc - Oct. 4, 2015, 5:06 p.m.
On Fri, Oct 2, 2015 at 1:19 PM, <timeless@mozdev.org> wrote:

> # HG changeset patch
> # User timeless@mozdev.org
> # Date 1443813482 14400
> #      Fri Oct 02 15:18:02 2015 -0400
> # Node ID 3afa079854db8653131291e10db34cf3bc440e35
> # Parent  8bd5471f48ec17944c641e3b7fbb8a2d9a8f1865
> server: support absolute urls (issue4877)
>
> rfc2616 sec 5.1.2 says they should work;
> tests on OS X 10.6 / Python 2.6 fail because
> hg+tinyproxy generate these urls.
>
> curl with tinyproxy:
> "GET /?cmd=capabilities HTTP/1.1" 200 -
> hg with tinyproxy:
> "GET https://localhost:.../?cmd=capabilities HTTP/1.1" 404 -
>
> diff --git a/mercurial/hgweb/server.py b/mercurial/hgweb/server.py
> --- a/mercurial/hgweb/server.py
> +++ b/mercurial/hgweb/server.py
> @@ -7,6 +7,7 @@
>  # GNU General Public License version 2 or any later version.
>
>  import os, sys, errno, urllib, BaseHTTPServer, socket, SocketServer,
> traceback
> +from urlparse import urlparse
>  from mercurial import util, error
>  from mercurial.hgweb import common
>  from mercurial.i18n import _
> @@ -90,9 +91,26 @@
>          self.do_POST()
>
>      def do_hgweb(self):
> +        env = {}
> +        parsed = urlparse(self.path)
> +        if parsed.scheme and parsed.netloc:
> +            #
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1.2


This should be referencing RFC 7230 since it obsoletes 2616.


>
> +            # To allow for transition to absoluteURIs in all requests in
> +            # future versions of HTTP, all HTTP/1.1 servers MUST accept
> the
> +            # absoluteURI form in requests, even though HTTP/1.1 clients
> will
> +            # only generate them in requests to proxies.
> +            server = parsed.netloc
> +            self.path = self.path[self.path.find(server) + len(server):]
>

I'd feel better if this used the parsed URL object to extract the path.
Should just be concatenation of its path and optional query string. You
could also set "path" and "query" right here and move the _splitURI bit
below into an else block.


> +            colon = server.rfind(':')
> +            if colon > 1:
> +                server = server[:colon]
> +            # https://www.python.org/dev/peps/pep-0333/#environ-variables
> +            # Note, however, that HTTP_HOST , if present, should be used
> in
> +            # preference to SERVER_NAME for reconstructing the request
> URL.
> +            # See the URL Reconstruction section below for more detail.
> +            env['HTTP_HOST'] = server
>          path, query = _splitURI(self.path)
>
> -        env = {}
>          env['GATEWAY_INTERFACE'] = 'CGI/1.1'
>          env['REQUEST_METHOD'] = self.command
>          env['SERVER_NAME'] = self.server.server_name
> diff --git a/tests/test-hgweb.t b/tests/test-hgweb.t
> --- a/tests/test-hgweb.t
> +++ b/tests/test-hgweb.t
> @@ -16,6 +16,10 @@
>    $ hg serve -n test -p $HGPORT -d --pid-file=hg.pid -A access.log -E
> errors.log
>    $ cat hg.pid >> $DAEMON_PIDS
>
> +supports rfcXXX gets
> +  $ (get-with-headers.py localhost:$HGPORT --headeronly "http://localhost:
> $HGPORT/?cmd=capabilities")
> +  200 Script output follows
> +
>  manifest
>
>    $ (get-with-headers.py localhost:$HGPORT 'file/tip/?style=raw')
> @@ -334,7 +338,7 @@
>  Test the access/error files are opened in append mode
>
>    $ $PYTHON -c "print len(file('access.log').readlines()), 'log lines
> written'"
> -  14 log lines written
> +  15 log lines written
>
>  static file
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> https://selenic.com/mailman/listinfo/mercurial-devel
>
timeless - Oct. 4, 2015, 8:41 p.m.
It isn't just query, since a ";" section is also allowed (and split out by
the module)...

I'll see about updating this mid-week
On Oct 4, 2015 1:07 PM, "Gregory Szorc" <gregory.szorc@gmail.com> wrote:

> On Fri, Oct 2, 2015 at 1:19 PM, <timeless@mozdev.org> wrote:
>
>> # HG changeset patch
>> # User timeless@mozdev.org
>> # Date 1443813482 14400
>> #      Fri Oct 02 15:18:02 2015 -0400
>> # Node ID 3afa079854db8653131291e10db34cf3bc440e35
>> # Parent  8bd5471f48ec17944c641e3b7fbb8a2d9a8f1865
>> server: support absolute urls (issue4877)
>>
>> rfc2616 sec 5.1.2 says they should work;
>> tests on OS X 10.6 / Python 2.6 fail because
>> hg+tinyproxy generate these urls.
>>
>> curl with tinyproxy:
>> "GET /?cmd=capabilities HTTP/1.1" 200 -
>> hg with tinyproxy:
>> "GET https://localhost:.../?cmd=capabilities HTTP/1.1" 404 -
>>
>> diff --git a/mercurial/hgweb/server.py b/mercurial/hgweb/server.py
>> --- a/mercurial/hgweb/server.py
>> +++ b/mercurial/hgweb/server.py
>> @@ -7,6 +7,7 @@
>>  # GNU General Public License version 2 or any later version.
>>
>>  import os, sys, errno, urllib, BaseHTTPServer, socket, SocketServer,
>> traceback
>> +from urlparse import urlparse
>>  from mercurial import util, error
>>  from mercurial.hgweb import common
>>  from mercurial.i18n import _
>> @@ -90,9 +91,26 @@
>>          self.do_POST()
>>
>>      def do_hgweb(self):
>> +        env = {}
>> +        parsed = urlparse(self.path)
>> +        if parsed.scheme and parsed.netloc:
>> +            #
>> http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1.2
>
>
> This should be referencing RFC 7230 since it obsoletes 2616.
>
>
>>
>> +            # To allow for transition to absoluteURIs in all requests in
>> +            # future versions of HTTP, all HTTP/1.1 servers MUST accept
>> the
>> +            # absoluteURI form in requests, even though HTTP/1.1 clients
>> will
>> +            # only generate them in requests to proxies.
>> +            server = parsed.netloc
>> +            self.path = self.path[self.path.find(server) + len(server):]
>>
>
> I'd feel better if this used the parsed URL object to extract the path.
> Should just be concatenation of its path and optional query string. You
> could also set "path" and "query" right here and move the _splitURI bit
> below into an else block.
>
>
>> +            colon = server.rfind(':')
>> +            if colon > 1:
>> +                server = server[:colon]
>> +            #
>> https://www.python.org/dev/peps/pep-0333/#environ-variables
>> +            # Note, however, that HTTP_HOST , if present, should be used
>> in
>> +            # preference to SERVER_NAME for reconstructing the request
>> URL.
>> +            # See the URL Reconstruction section below for more detail.
>> +            env['HTTP_HOST'] = server
>>          path, query = _splitURI(self.path)
>>
>> -        env = {}
>>          env['GATEWAY_INTERFACE'] = 'CGI/1.1'
>>          env['REQUEST_METHOD'] = self.command
>>          env['SERVER_NAME'] = self.server.server_name
>> diff --git a/tests/test-hgweb.t b/tests/test-hgweb.t
>> --- a/tests/test-hgweb.t
>> +++ b/tests/test-hgweb.t
>> @@ -16,6 +16,10 @@
>>    $ hg serve -n test -p $HGPORT -d --pid-file=hg.pid -A access.log -E
>> errors.log
>>    $ cat hg.pid >> $DAEMON_PIDS
>>
>> +supports rfcXXX gets
>> +  $ (get-with-headers.py localhost:$HGPORT --headeronly
>> "http://localhost:$HGPORT/?cmd=capabilities")
>> +  200 Script output follows
>> +
>>  manifest
>>
>>    $ (get-with-headers.py localhost:$HGPORT 'file/tip/?style=raw')
>> @@ -334,7 +338,7 @@
>>  Test the access/error files are opened in append mode
>>
>>    $ $PYTHON -c "print len(file('access.log').readlines()), 'log lines
>> written'"
>> -  14 log lines written
>> +  15 log lines written
>>
>>  static file
>>
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel@selenic.com
>> https://selenic.com/mailman/listinfo/mercurial-devel
>>
>
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> https://selenic.com/mailman/listinfo/mercurial-devel
>
>
Augie Fackler - Oct. 7, 2015, 7:14 p.m.
On Sun, Oct 04, 2015 at 04:41:11PM -0400, timeless wrote:
> It isn't just query, since a ";" section is also allowed (and split out by
> the module)...
>
> I'll see about updating this mid-week

I assume I should drop this series and await a resend, yes?

> On Oct 4, 2015 1:07 PM, "Gregory Szorc" <gregory.szorc@gmail.com> wrote:
>
> > On Fri, Oct 2, 2015 at 1:19 PM, <timeless@mozdev.org> wrote:
> >
> >> # HG changeset patch
> >> # User timeless@mozdev.org
> >> # Date 1443813482 14400
> >> #      Fri Oct 02 15:18:02 2015 -0400
> >> # Node ID 3afa079854db8653131291e10db34cf3bc440e35
> >> # Parent  8bd5471f48ec17944c641e3b7fbb8a2d9a8f1865
> >> server: support absolute urls (issue4877)
> >>
> >> rfc2616 sec 5.1.2 says they should work;
> >> tests on OS X 10.6 / Python 2.6 fail because
> >> hg+tinyproxy generate these urls.
> >>
> >> curl with tinyproxy:
> >> "GET /?cmd=capabilities HTTP/1.1" 200 -
> >> hg with tinyproxy:
> >> "GET https://localhost:.../?cmd=capabilities HTTP/1.1" 404 -
> >>
> >> diff --git a/mercurial/hgweb/server.py b/mercurial/hgweb/server.py
> >> --- a/mercurial/hgweb/server.py
> >> +++ b/mercurial/hgweb/server.py
> >> @@ -7,6 +7,7 @@
> >>  # GNU General Public License version 2 or any later version.
> >>
> >>  import os, sys, errno, urllib, BaseHTTPServer, socket, SocketServer,
> >> traceback
> >> +from urlparse import urlparse
> >>  from mercurial import util, error
> >>  from mercurial.hgweb import common
> >>  from mercurial.i18n import _
> >> @@ -90,9 +91,26 @@
> >>          self.do_POST()
> >>
> >>      def do_hgweb(self):
> >> +        env = {}
> >> +        parsed = urlparse(self.path)
> >> +        if parsed.scheme and parsed.netloc:
> >> +            #
> >> http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1.2
> >
> >
> > This should be referencing RFC 7230 since it obsoletes 2616.
> >
> >
> >>
> >> +            # To allow for transition to absoluteURIs in all requests in
> >> +            # future versions of HTTP, all HTTP/1.1 servers MUST accept
> >> the
> >> +            # absoluteURI form in requests, even though HTTP/1.1 clients
> >> will
> >> +            # only generate them in requests to proxies.
> >> +            server = parsed.netloc
> >> +            self.path = self.path[self.path.find(server) + len(server):]
> >>
> >
> > I'd feel better if this used the parsed URL object to extract the path.
> > Should just be concatenation of its path and optional query string. You
> > could also set "path" and "query" right here and move the _splitURI bit
> > below into an else block.
> >
> >
> >> +            colon = server.rfind(':')
> >> +            if colon > 1:
> >> +                server = server[:colon]
> >> +            #
> >> https://www.python.org/dev/peps/pep-0333/#environ-variables
> >> +            # Note, however, that HTTP_HOST , if present, should be used
> >> in
> >> +            # preference to SERVER_NAME for reconstructing the request
> >> URL.
> >> +            # See the URL Reconstruction section below for more detail.
> >> +            env['HTTP_HOST'] = server
> >>          path, query = _splitURI(self.path)
> >>
> >> -        env = {}
> >>          env['GATEWAY_INTERFACE'] = 'CGI/1.1'
> >>          env['REQUEST_METHOD'] = self.command
> >>          env['SERVER_NAME'] = self.server.server_name
> >> diff --git a/tests/test-hgweb.t b/tests/test-hgweb.t
> >> --- a/tests/test-hgweb.t
> >> +++ b/tests/test-hgweb.t
> >> @@ -16,6 +16,10 @@
> >>    $ hg serve -n test -p $HGPORT -d --pid-file=hg.pid -A access.log -E
> >> errors.log
> >>    $ cat hg.pid >> $DAEMON_PIDS
> >>
> >> +supports rfcXXX gets
> >> +  $ (get-with-headers.py localhost:$HGPORT --headeronly
> >> "http://localhost:$HGPORT/?cmd=capabilities")
> >> +  200 Script output follows
> >> +
> >>  manifest
> >>
> >>    $ (get-with-headers.py localhost:$HGPORT 'file/tip/?style=raw')
> >> @@ -334,7 +338,7 @@
> >>  Test the access/error files are opened in append mode
> >>
> >>    $ $PYTHON -c "print len(file('access.log').readlines()), 'log lines
> >> written'"
> >> -  14 log lines written
> >> +  15 log lines written
> >>
> >>  static file
> >>
> >> _______________________________________________
> >> Mercurial-devel mailing list
> >> Mercurial-devel@selenic.com
> >> https://selenic.com/mailman/listinfo/mercurial-devel
> >>
> >
> >
> > _______________________________________________
> > Mercurial-devel mailing list
> > Mercurial-devel@selenic.com
> > https://selenic.com/mailman/listinfo/mercurial-devel
> >
> >

> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> https://selenic.com/mailman/listinfo/mercurial-devel

Patch

diff --git a/mercurial/hgweb/server.py b/mercurial/hgweb/server.py
--- a/mercurial/hgweb/server.py
+++ b/mercurial/hgweb/server.py
@@ -7,6 +7,7 @@ 
 # GNU General Public License version 2 or any later version.
 
 import os, sys, errno, urllib, BaseHTTPServer, socket, SocketServer, traceback
+from urlparse import urlparse
 from mercurial import util, error
 from mercurial.hgweb import common
 from mercurial.i18n import _
@@ -90,9 +91,26 @@ 
         self.do_POST()
 
     def do_hgweb(self):
+        env = {}
+        parsed = urlparse(self.path)
+        if parsed.scheme and parsed.netloc:
+            # http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1.2
+            # To allow for transition to absoluteURIs in all requests in
+            # future versions of HTTP, all HTTP/1.1 servers MUST accept the
+            # absoluteURI form in requests, even though HTTP/1.1 clients will
+            # only generate them in requests to proxies.
+            server = parsed.netloc
+            self.path = self.path[self.path.find(server) + len(server):]
+            colon = server.rfind(':')
+            if colon > 1:
+                server = server[:colon]
+            # https://www.python.org/dev/peps/pep-0333/#environ-variables
+            # Note, however, that HTTP_HOST , if present, should be used in
+            # preference to SERVER_NAME for reconstructing the request URL.
+            # See the URL Reconstruction section below for more detail.
+            env['HTTP_HOST'] = server
         path, query = _splitURI(self.path)
 
-        env = {}
         env['GATEWAY_INTERFACE'] = 'CGI/1.1'
         env['REQUEST_METHOD'] = self.command
         env['SERVER_NAME'] = self.server.server_name
diff --git a/tests/test-hgweb.t b/tests/test-hgweb.t
--- a/tests/test-hgweb.t
+++ b/tests/test-hgweb.t
@@ -16,6 +16,10 @@ 
   $ hg serve -n test -p $HGPORT -d --pid-file=hg.pid -A access.log -E errors.log
   $ cat hg.pid >> $DAEMON_PIDS
 
+supports rfcXXX gets
+  $ (get-with-headers.py localhost:$HGPORT --headeronly "http://localhost:$HGPORT/?cmd=capabilities")
+  200 Script output follows
+
 manifest
 
   $ (get-with-headers.py localhost:$HGPORT 'file/tip/?style=raw')
@@ -334,7 +338,7 @@ 
 Test the access/error files are opened in append mode
 
   $ $PYTHON -c "print len(file('access.log').readlines()), 'log lines written'"
-  14 log lines written
+  15 log lines written
 
 static file