Patchwork hgweb: require files and directory links to begin with 'path:'

login
register
mail settings
Submitter Angel Ezquerra
Date March 20, 2013, 10:15 p.m.
Message ID <84f24bbe0c3e97e4e175.1363817741@Angel-PC.localdomain>
Download mbox | patch
Permalink /patch/1144/
State Rejected, archived
Headers show

Comments

Angel Ezquerra - March 20, 2013, 10:15 p.m.
# HG changeset patch
# User Angel Ezquerra <angel.ezquerra@gmail.com>
# Date 1363809088 -3600
#      Wed Mar 20 20:51:28 2013 +0100
# Node ID 84f24bbe0c3e97e4e1753c894e3c963d1e6c6d63
# Parent  136516cd3d6902aaa2edc9befc65763c56a6dbfc
hgweb: require files and directory links to begin with 'path:'

If they don't the server will reply with a 403 HTTP forbidden error. This gets
rid of the need to explicitly check for the known pattern types.

Note that the templater has access to a "path" variable which is a path to the
current file or directory relative to the root of the repository, and which
begins with a "/". However archival.archive() expects 'path:' to not begin with
a '/'. To cope with this webcommands.archive must remove the extra '/' which is
passed by the templater.
Angel Ezquerra - March 20, 2013, 10:23 p.m.
On Wed, Mar 20, 2013 at 11:15 PM, Angel Ezquerra
<angel.ezquerra@gmail.com> wrote:
> # HG changeset patch
> # User Angel Ezquerra <angel.ezquerra@gmail.com>
> # Date 1363809088 -3600
> #      Wed Mar 20 20:51:28 2013 +0100
> # Node ID 84f24bbe0c3e97e4e1753c894e3c963d1e6c6d63
> # Parent  136516cd3d6902aaa2edc9befc65763c56a6dbfc
> hgweb: require files and directory links to begin with 'path:'
>
> If they don't the server will reply with a 403 HTTP forbidden error. This gets
> rid of the need to explicitly check for the known pattern types.
>
> Note that the templater has access to a "path" variable which is a path to the
> current file or directory relative to the root of the repository, and which
> begins with a "/". However archival.archive() expects 'path:' to not begin with
> a '/'. To cope with this webcommands.archive must remove the extra '/' which is
> passed by the templater.

Sorry, I forgot to mark this as RFC as I have not updated the tests.

This patch is in response to the recent thread between me, Kevin and
Pierre-Yves.

The code is a little clunky because I had to manually remove the extra
slash that the templater adds to the current path. I didn't find a way
to remove the extra slash using the template engine, so I had to
remove it on the webcommands.archive side.

Comments, particularly on how to improve this last part would be nice.

Angel
Pierre-Yves David - March 21, 2013, 10:31 a.m.
On Wed, Mar 20, 2013 at 11:15:41PM +0100, Angel Ezquerra wrote:
> # HG changeset patch
> # User Angel Ezquerra <angel.ezquerra@gmail.com>
> # Date 1363809088 -3600
> #      Wed Mar 20 20:51:28 2013 +0100
> # Node ID 84f24bbe0c3e97e4e1753c894e3c963d1e6c6d63
> # Parent  136516cd3d6902aaa2edc9befc65763c56a6dbfc
> hgweb: require files and directory links to begin with 'path:'
> 
> If they don't the server will reply with a 403 HTTP forbidden error. This gets
> rid of the need to explicitly check for the known pattern types.

I really fails to grasp why you complexify for whole argument process for the
shake of being able to replay a 403 in some rare error case.

In my opinion, the "file" argument should be documented as a plain path and
processed as such. It make not sense to allows any pattern here. They should
not be recognised. Returning a 404 in this case is fine by me for the shake of
simplicity.


> Note that the templater has access to a "path" variable which is a path to the
> current file or directory relative to the root of the repository, and which
> begins with a "/". However archival.archive() expects 'path:' to not begin with
> a '/'. To cope with this webcommands.archive must remove the extra '/' which is
> passed by the templater.
> 
> diff --git a/mercurial/hgweb/webcommands.py b/mercurial/hgweb/webcommands.py
> --- a/mercurial/hgweb/webcommands.py
> +++ b/mercurial/hgweb/webcommands.py
> @@ -822,12 +822,11 @@
>      file = req.form.get('file', None)
>      if file:
>          file = file[0]
> -        patandfile = file.split(':')
> -        if len(patandfile) > 1 and patandfile[0].lower() in ('glob', 'relglob',
> -                'path', 'relpath', 're', 'relre', 'set'):
> -            msg = 'Archive pattern not allowed: %s' % file
> +        if not file.lower().startswith('path:/'):
> +            msg = "Archive path must begin with 'path:'"
>              raise ErrorResponse(HTTP_FORBIDDEN, msg)
> -        pats = ['path:' + file]
> +        # The file path has an extra "/" that must be removed
> +        pats = ['path:' + file[6:]]

consider using lstrip('/')
Angel Ezquerra - March 21, 2013, 11:30 a.m.
On Thu, Mar 21, 2013 at 11:31 AM, Pierre-Yves David
<pierre-yves.david@logilab.fr> wrote:
> On Wed, Mar 20, 2013 at 11:15:41PM +0100, Angel Ezquerra wrote:
>> # HG changeset patch
>> # User Angel Ezquerra <angel.ezquerra@gmail.com>
>> # Date 1363809088 -3600
>> #      Wed Mar 20 20:51:28 2013 +0100
>> # Node ID 84f24bbe0c3e97e4e1753c894e3c963d1e6c6d63
>> # Parent  136516cd3d6902aaa2edc9befc65763c56a6dbfc
>> hgweb: require files and directory links to begin with 'path:'
>>
>> If they don't the server will reply with a 403 HTTP forbidden error. This gets
>> rid of the need to explicitly check for the known pattern types.
>
> I really fails to grasp why you complexify for whole argument process for the
> shake of being able to replay a 403 in some rare error case.
>
> In my opinion, the "file" argument should be documented as a plain path and
> processed as such. It make not sense to allows any pattern here. They should
> not be recognised. Returning a 404 in this case is fine by me for the shake of
> simplicity.

I'm confused. I thought this patch _simplified_ the code. If we did
not have to remove the extra starting slash the code would be as
simple as doing a simple file.startswith('path:') check.

Note that without this check archival will happily return an empty
archive. The user cannot tell whether its input patter is wrong or the
file simply doesn't exist.

>> Note that the templater has access to a "path" variable which is a path to the
>> current file or directory relative to the root of the repository, and which
>> begins with a "/". However archival.archive() expects 'path:' to not begin with
>> a '/'. To cope with this webcommands.archive must remove the extra '/' which is
>> passed by the templater.
>>
>> diff --git a/mercurial/hgweb/webcommands.py b/mercurial/hgweb/webcommands.py
>> --- a/mercurial/hgweb/webcommands.py
>> +++ b/mercurial/hgweb/webcommands.py
>> @@ -822,12 +822,11 @@
>>      file = req.form.get('file', None)
>>      if file:
>>          file = file[0]
>> -        patandfile = file.split(':')
>> -        if len(patandfile) > 1 and patandfile[0].lower() in ('glob', 'relglob',
>> -                'path', 'relpath', 're', 'relre', 'set'):
>> -            msg = 'Archive pattern not allowed: %s' % file
>> +        if not file.lower().startswith('path:/'):
>> +            msg = "Archive path must begin with 'path:'"
>>              raise ErrorResponse(HTTP_FORBIDDEN, msg)
>> -        pats = ['path:' + file]
>> +        # The file path has an extra "/" that must be removed
>> +        pats = ['path:' + file[6:]]
>
> consider using lstrip('/')
>

Neat, I did not know about that.

Should I resend the patch with this little change or are you agains
the patch as a whole?

Cheers,

Angel
Angel Ezquerra - March 21, 2013, 9:07 p.m.
On Thu, Mar 21, 2013 at 3:51 PM, Kevin Bullock
<kbullock+mercurial@ringworld.org> wrote:
> On 20 Mar 2013, at 5:15 PM, Angel Ezquerra wrote:
>
>> # HG changeset patch
>> # User Angel Ezquerra <angel.ezquerra@gmail.com>
>> # Date 1363809088 -3600
>> #      Wed Mar 20 20:51:28 2013 +0100
>> # Node ID 84f24bbe0c3e97e4e1753c894e3c963d1e6c6d63
>> # Parent  136516cd3d6902aaa2edc9befc65763c56a6dbfc
>> hgweb: require files and directory links to begin with 'path:'
>
> I'm -1 on this. It simplifies the code slightly but leaks an implementation detail. In fact I'm becoming more convinced that we should just be returning a 404 here anyway, and document the new feature as only taking explicit path names.
>
> Either way, we still need to turn an empty archive into a 404.

OK,

it seems actually quite easy to through an error.Abort when
archival.archive finds no files. I will prepare a patch and send it.
Note that I think we should only throw the exception if we pass a
pattern and it matches no files. If we pass no pattern and there are
no files (e.g. the repo is empty) then there should be no exception
thrown IMO.

Cheers,

Angel
Angel Ezquerra - March 21, 2013, 10:08 p.m.
On Thu, Mar 21, 2013 at 10:07 PM, Angel Ezquerra
<angel.ezquerra@gmail.com> wrote:
> On Thu, Mar 21, 2013 at 3:51 PM, Kevin Bullock
> <kbullock+mercurial@ringworld.org> wrote:
>> On 20 Mar 2013, at 5:15 PM, Angel Ezquerra wrote:
>>
>>> # HG changeset patch
>>> # User Angel Ezquerra <angel.ezquerra@gmail.com>
>>> # Date 1363809088 -3600
>>> #      Wed Mar 20 20:51:28 2013 +0100
>>> # Node ID 84f24bbe0c3e97e4e1753c894e3c963d1e6c6d63
>>> # Parent  136516cd3d6902aaa2edc9befc65763c56a6dbfc
>>> hgweb: require files and directory links to begin with 'path:'
>>
>> I'm -1 on this. It simplifies the code slightly but leaks an implementation detail. In fact I'm becoming more convinced that we should just be returning a 404 here anyway, and document the new feature as only taking explicit path names.
>>
>> Either way, we still need to turn an empty archive into a 404.
>
> OK,
>
> it seems actually quite easy to through an error.Abort when
> archival.archive finds no files. I will prepare a patch and send it.
> Note that I think we should only throw the exception if we pass a
> pattern and it matches no files. If we pass no pattern and there are
> no files (e.g. the repo is empty) then there should be no exception
> thrown IMO.
>
> Cheers,
>
> Angel

Umm, this is a little less straight forward that it seemd.

It is easy to make archival.archive raise an exception when no files are match.

The problem is that when webcommands.archive calls archival.archive,
it has already sent an HTTP_OK response. Thus, the exception is risen,
but the client never sees it.

I think you must send HTTP_OK before a succesful archival.archive().
The only way I can see to generate the exception earlier is to look
for files matching the pattern outside of archival.archive. That is
something like:

    files = [f for f in ctx.manifest().keys() if matchfn(f)]

But then we'd be doing this twice (once outside archival.archive and
once within).

Would that be acceptable?

Cheers,

Angel

Patch

diff --git a/mercurial/hgweb/webcommands.py b/mercurial/hgweb/webcommands.py
--- a/mercurial/hgweb/webcommands.py
+++ b/mercurial/hgweb/webcommands.py
@@ -822,12 +822,11 @@ 
     file = req.form.get('file', None)
     if file:
         file = file[0]
-        patandfile = file.split(':')
-        if len(patandfile) > 1 and patandfile[0].lower() in ('glob', 'relglob',
-                'path', 'relpath', 're', 'relre', 'set'):
-            msg = 'Archive pattern not allowed: %s' % file
+        if not file.lower().startswith('path:/'):
+            msg = "Archive path must begin with 'path:'"
             raise ErrorResponse(HTTP_FORBIDDEN, msg)
-        pats = ['path:' + file]
+        # The file path has an extra "/" that must be removed
+        pats = ['path:' + file[6:]]
 
     mimetype, artype, extension, encoding = web.archive_specs[type_]
     headers = [
diff --git a/mercurial/templates/coal/map b/mercurial/templates/coal/map
--- a/mercurial/templates/coal/map
+++ b/mercurial/templates/coal/map
@@ -224,7 +224,7 @@ 
 index = ../paper/index.tmpl
 archiveentry = '
   <li>
-    <a href="{url|urlescape}archive/{node|short}{extension|urlescape}{ifeq(path,'/','',path|urlescape)}">{type|escape}</a>
+    <a href="{url|urlescape}archive/{node|short}{extension|urlescape}{ifeq(path,'/','','path:{path|urlescape}')}">{type|escape}</a>
   </li>'
 notfound = ../paper/notfound.tmpl
 error = ../paper/error.tmpl
diff --git a/mercurial/templates/gitweb/map b/mercurial/templates/gitweb/map
--- a/mercurial/templates/gitweb/map
+++ b/mercurial/templates/gitweb/map
@@ -289,7 +289,7 @@ 
     <td class="link">
       <a href="{url|urlescape}file/{node|short}/{file|urlescape}{sessionvars%urlparameter}">file</a>&nbsp;|&nbsp;<a href="{url|urlescape}diff/{node|short}/{file|urlescape}{sessionvars%urlparameter}">diff</a>&nbsp;|&nbsp;<a href="{url|urlescape}annotate/{node|short}/{file|urlescape}{sessionvars%urlparameter}">annotate</a> {rename%filelogrename}</td>
     </tr>'
-archiveentry = ' | <a href="{url|urlescape}archive/{node|short}{extension}{ifeq(path,'/','',path|urlescape)}">{type|escape}</a> '
+archiveentry = ' | <a href="{url|urlescape}archive/{node|short}{extension}{ifeq(path,'/','','/path:{path|urlescape}')}">{type|escape}</a> '
 indexentry = '
   <tr class="parity{parity}">
     <td>
diff --git a/mercurial/templates/monoblue/map b/mercurial/templates/monoblue/map
--- a/mercurial/templates/monoblue/map
+++ b/mercurial/templates/monoblue/map
@@ -245,7 +245,7 @@ 
       {rename%filelogrename}
     </td>
   </tr>'
-archiveentry = '<li><a href="{url|urlescape}archive/{node|short}{extension}{ifeq(path,'/','',path|urlescape)}">{type|escape}</a></li>'
+archiveentry = '<li><a href="{url|urlescape}archive/{node|short}{extension}{ifeq(path,'/','','/path:{path|urlescape}')}">{type|escape}</a></li>'
 indexentry = '
   <tr class="parity{parity}">
     <td><a href="{url|urlescape}{sessionvars%urlparameter}">{name|escape}</a></td>
diff --git a/mercurial/templates/paper/map b/mercurial/templates/paper/map
--- a/mercurial/templates/paper/map
+++ b/mercurial/templates/paper/map
@@ -232,7 +232,7 @@ 
 index = index.tmpl
 archiveentry = '
   <li>
-    <a href="{url|urlescape}archive/{node|short}{extension|urlescape}{ifeq(path,'/','',path|urlescape)}">{type|escape}</a>
+    <a href="{url|urlescape}archive/{node|short}{extension|urlescape}{ifeq(path,'/','','/path:{path|urlescape}')}">{type|escape}</a>
   </li>'
 notfound = notfound.tmpl
 error = error.tmpl