Patchwork [1,of,3] hgweb: allow urlencoded forward slashes in specified revisions

login
register
mail settings
Submitter Anton Shestakov
Date July 13, 2015, 12:18 p.m.
Message ID <4b2713531ee8955fd282.1436789938@neuro>
Download mbox | patch
Permalink /patch/9960/
State Changes Requested
Headers show

Comments

Anton Shestakov - July 13, 2015, 12:18 p.m.
# HG changeset patch
# User Anton Shestakov <av6@dwimlabs.net>
# Date 1436688417 -28800
#      Sun Jul 12 16:06:57 2015 +0800
# Node ID 4b2713531ee8955fd282ae77cedfc3308c7fa98a
# Parent  648323f41a89619d9eeeb7287213378c340866c8
hgweb: allow urlencoded forward slashes in specified revisions

It's possible to have a branch/tag/bookmark with all kinds of special
characters, such as {}/\!?. While not very conveniently, symbolic revisions
with such characters work from command line if user correctly quotes the
characters. These characters also work in hgweb, when they are properly
encoded, with one exception: '/' (forward slash, urlencoded as '%2F'), which
was getting decoded before hgweb could parse it as a part of PATH_INFO.
Because of that, hgweb was seeing it as any other forward slash, that is, as
just another url parts separator.

For example, if user wanted to see the content of dir/file at bookmark
'feature/eggs', url could be: '/file/feature%2Feggs/dir/file'. But hgweb tried
to find a revision 'feature' and get contents of 'eggs/dir/file'.

To fix this, let's inspect the contents of REQUEST_URI (which, if present,
contains "raw" url), find '%2F' in revision identifier and then re-join parts
of PATH_INFO that were split on decoded '%2F' characters (as opposed to real
forward slashes).

I've also did node.replace('%2f', '/') to future-proof the case when (if?)
PATH_INFO contains '%2f'. I don't know if any CGI/WSGI server does this.
Yuya Nishihara - July 13, 2015, 1:41 p.m.
On Mon, 13 Jul 2015 20:18:58 +0800, Anton Shestakov wrote:
> # HG changeset patch
> # User Anton Shestakov <av6@dwimlabs.net>
> # Date 1436688417 -28800
> #      Sun Jul 12 16:06:57 2015 +0800
> # Node ID 4b2713531ee8955fd282ae77cedfc3308c7fa98a
> # Parent  648323f41a89619d9eeeb7287213378c340866c8
> hgweb: allow urlencoded forward slashes in specified revisions
> 
> It's possible to have a branch/tag/bookmark with all kinds of special
> characters, such as {}/\!?. While not very conveniently, symbolic revisions
> with such characters work from command line if user correctly quotes the
> characters. These characters also work in hgweb, when they are properly
> encoded, with one exception: '/' (forward slash, urlencoded as '%2F'), which
> was getting decoded before hgweb could parse it as a part of PATH_INFO.
> Because of that, hgweb was seeing it as any other forward slash, that is, as
> just another url parts separator.
> 
> For example, if user wanted to see the content of dir/file at bookmark
> 'feature/eggs', url could be: '/file/feature%2Feggs/dir/file'. But hgweb tried
> to find a revision 'feature' and get contents of 'eggs/dir/file'.
> 
> To fix this, let's inspect the contents of REQUEST_URI (which, if present,
> contains "raw" url), find '%2F' in revision identifier and then re-join parts
> of PATH_INFO that were split on decoded '%2F' characters (as opposed to real
> forward slashes).

It's a source of trouble to escape '/' in path component as '%2F'.
For example, if an hgweb is behind an NGINX reverse proxy, we can't see
the original encoded URI but the normalized version.

If we want to support "foo/bar" tags, I think we have to introduce a different
encoding scheme or ?query= syntax.
Anton Shestakov - July 13, 2015, 2:25 p.m.
On Mon, 13 Jul 2015 22:41:03 +0900
Yuya Nishihara <yuya@tcha.org> wrote:

> On Mon, 13 Jul 2015 20:18:58 +0800, Anton Shestakov wrote:
> > # HG changeset patch
> > # User Anton Shestakov <av6@dwimlabs.net>
> > # Date 1436688417 -28800
> > #      Sun Jul 12 16:06:57 2015 +0800
> > # Node ID 4b2713531ee8955fd282ae77cedfc3308c7fa98a
> > # Parent  648323f41a89619d9eeeb7287213378c340866c8
> > hgweb: allow urlencoded forward slashes in specified revisions
> > 
> > It's possible to have a branch/tag/bookmark with all kinds of
> > special characters, such as {}/\!?. While not very conveniently,
> > symbolic revisions with such characters work from command line if
> > user correctly quotes the characters. These characters also work in
> > hgweb, when they are properly encoded, with one exception:
> > '/' (forward slash, urlencoded as '%2F'), which was getting decoded
> > before hgweb could parse it as a part of PATH_INFO. Because of
> > that, hgweb was seeing it as any other forward slash, that is, as
> > just another url parts separator.
> > 
> > For example, if user wanted to see the content of dir/file at
> > bookmark 'feature/eggs', url could be:
> > '/file/feature%2Feggs/dir/file'. But hgweb tried to find a revision
> > 'feature' and get contents of 'eggs/dir/file'.
> > 
> > To fix this, let's inspect the contents of REQUEST_URI (which, if
> > present, contains "raw" url), find '%2F' in revision identifier and
> > then re-join parts of PATH_INFO that were split on decoded '%2F'
> > characters (as opposed to real forward slashes).
> 
> It's a source of trouble to escape '/' in path component as '%2F'.
> For example, if an hgweb is behind an NGINX reverse proxy, we can't
> see the original encoded URI but the normalized version.

Huh. Indeed, it didn't work on my nginx+gunicorn setup. Although I
don't think it's because of nginx - rather it's the WSGI server that
parses the raw url. In case of gunicorn, it doesn't pass REQUEST_URI,
but instead RAW_URI (which does contain raw %2F).

So option number one is to also try req.env['RAW_URI'].

> If we want to support "foo/bar" tags, I think we have to introduce a
> different encoding scheme or ?query= syntax.

?query= (or rather ?node=) syntax would work, but it would require
rewriting every place that passes revision context to another page (so
the majority of hgweb templates). Also, it doesn't look nice, which was
the point of introducing the human-friendly url scheme into hgweb.

But different encoding scheme is an interesting idea. I've tried
another solution for this: doubly-encoding slashes into %252F. That
seems to work (and is a suggested solution for urlencoded slashes in
Apache httpd), so it's an option number two, and if nobody has
objections, I will try and send a V2 with revescape filter that does
this.

Thanks for the feedback!

Patch

diff --git a/mercurial/hgweb/hgweb_mod.py b/mercurial/hgweb/hgweb_mod.py
--- a/mercurial/hgweb/hgweb_mod.py
+++ b/mercurial/hgweb/hgweb_mod.py
@@ -185,6 +185,21 @@  class hgweb(object):
             repo_parts = req.env.get('REPO_NAME', '').split('/')
             if parts[:len(repo_parts)] == repo_parts:
                 parts = parts[len(repo_parts):]
+            # Work around cases when revision contains '%2F' (an escaped '/'
+            # character), and CGI/WSGI server replaces it with a literal slash
+            # in PATH_INFO.
+            rawpath = req.env.get('REQUEST_URI', '').split('?', 1)[0]
+            if ('%2F' in rawpath.upper() and
+                '%2F' not in req.env['PATH_INFO'].upper()):
+                rawparts = rawpath.strip('/').split('/')
+                if rawparts[:len(repo_parts)] == repo_parts:
+                    rawparts = rawparts[len(repo_parts):]
+                # We only need revision from '/<cmd>/<revision>/<the rest>'
+                if len(rawparts) > 1:
+                    revstring = rawparts[1]
+                    count = revstring.upper().count('%2F')
+                    for i in range(count):
+                        parts[1:3] = [parts[1] + '%2F' + parts[2]]
             query = '/'.join(parts)
         else:
             query = req.env['QUERY_STRING'].split('&', 1)[0]
@@ -235,7 +250,7 @@  class hgweb(object):
                 req.form['file'] = ['/'.join(args)]
             else:
                 if args and args[0]:
-                    node = args.pop(0)
+                    node = args.pop(0).replace('%2F', '/').replace('%2f', '/')
                     req.form['node'] = [node]
                 if args:
                     req.form['file'] = args
diff --git a/tests/test-hgweb.t b/tests/test-hgweb.t
--- a/tests/test-hgweb.t
+++ b/tests/test-hgweb.t
@@ -10,6 +10,9 @@  Some tests for hgweb. Tests static files
   $ hg ci -Ambase
   adding da/foo
   adding foo
+  $ hg bookmark -r0 '@'
+  $ hg bookmark -r0 'a b c'
+  $ hg bookmark -r0 'd/e/f'
   $ hg serve -n test -p $HGPORT -d --pid-file=hg.pid -A access.log -E errors.log
   $ cat hg.pid >> $DAEMON_PIDS
 
@@ -264,7 +267,7 @@  try bad style
   <h2 class="breadcrumb"><a href="/">Mercurial</a> </h2>
   <h3>
    directory / @ 0:<a href="/rev/2ef0ac749a14">2ef0ac749a14</a>
-   <span class="tag">tip</span> 
+   <span class="tag">tip</span> <span class="tag">@</span> <span class="tag">a b c</span> <span class="tag">d/e/f</span> 
   </h3>
   
   <form class="search" action="/log">
@@ -557,6 +560,9 @@  phase changes are refreshed (issue4061)
   summary:     base
   branch:      default
   tag:         tip
+  bookmark:    @
+  bookmark:    a b c
+  bookmark:    d/e/f
   
   
   $ hg phase --draft tip
@@ -580,9 +586,30 @@  phase changes are refreshed (issue4061)
   user:        test
   date:        Thu, 01 Jan 1970 00:00:00 +0000
   summary:     base
+  bookmark:    @
+  bookmark:    a b c
+  bookmark:    d/e/f
   
   
 
+access bookmarks
+
+  $ get-with-headers.py localhost:$HGPORT 'rev/@?style=paper' | egrep '^200|changeset 0:'
+  200 Script output follows
+   changeset 0:<a href="/rev/2ef0ac749a14?style=paper">2ef0ac749a14</a>
+
+  $ get-with-headers.py localhost:$HGPORT 'rev/%40?style=paper' | egrep '^200|changeset 0:'
+  200 Script output follows
+   changeset 0:<a href="/rev/2ef0ac749a14?style=paper">2ef0ac749a14</a>
+
+  $ get-with-headers.py localhost:$HGPORT 'rev/a%20b%20c?style=paper' | egrep '^200|changeset 0:'
+  200 Script output follows
+   changeset 0:<a href="/rev/2ef0ac749a14?style=paper">2ef0ac749a14</a>
+
+  $ get-with-headers.py localhost:$HGPORT 'rev/d%2Fe%2Ff?style=paper' | egrep '^200|changeset 0:'
+  200 Script output follows
+   changeset 0:<a href="/rev/2ef0ac749a14?style=paper">2ef0ac749a14</a>
+
 no style can be loaded from directories other than the specified paths
 
   $ mkdir -p x/templates/fallback