Patchwork [2,of,3,V6] hgweb: teach archive how to download a specific directory or file

login
register
mail settings
Submitter Angel Ezquerra
Date Feb. 27, 2013, 5:06 p.m.
Message ID <b9fda7cbf2c239cecd66.1361984793@Angel-PC.localdomain>
Download mbox | patch
Permalink /patch/1061/
State Superseded
Commit bb38f4f78104f16ba3538df3b8cb3465755e5f57
Headers show

Comments

Angel Ezquerra - Feb. 27, 2013, 5:06 p.m.
# HG changeset patch
# User Angel Ezquerra <angel.ezquerra@gmail.com>
# Date 1360493525 -3600
# Node ID b9fda7cbf2c239cecd667159391befb440edba80
# Parent  a1aab0b30144b6d9b383e8021d24eecd5861c84e
hgweb: teach archive how to download a specific directory or file

The archive web command now takes into account the "file" request entry, if one
is provided.

The provided "file" is processed as a "path" corresponding to a directory or
file that will be downloaded.

With this change hgweb can to process requests such as:

    http://mercurial.selenic.com/hg/archive/tip.zip/mercurial/templates

This will download all files on the mercurial/templates directory as a zip file.
It is not possible to specify file patterns. Those will be rejected with a 403
reply fromthe server.

Note that this is a first step to add support for downloading directories from
the web interface. Currently the only way to use this feature is by manually
constructing the URL that you want to download. We will have to modify the
archiveentry map entry on the different templates so that it adds the current
folder path to the archive links.
Pierre-Yves David - March 12, 2013, 5:29 p.m.
On Wed, Feb 27, 2013 at 06:06:33PM +0100, Angel Ezquerra wrote:
> # HG changeset patch
> # User Angel Ezquerra <angel.ezquerra@gmail.com>
> # Date 1360493525 -3600
> # Node ID b9fda7cbf2c239cecd667159391befb440edba80
> # Parent  a1aab0b30144b6d9b383e8021d24eecd5861c84e
> hgweb: teach archive how to download a specific directory or file
> 
> The archive web command now takes into account the "file" request entry, if one
> is provided.
> 
> The provided "file" is processed as a "path" corresponding to a directory or
> file that will be downloaded.
> 
> With this change hgweb can to process requests such as:
> 
>     http://mercurial.selenic.com/hg/archive/tip.zip/mercurial/templates
> 
> This will download all files on the mercurial/templates directory as a zip file.
> It is not possible to specify file patterns. Those will be rejected with a 403
> reply fromthe server.
> 
> Note that this is a first step to add support for downloading directories from
> the web interface. Currently the only way to use this feature is by manually
> constructing the URL that you want to download. We will have to modify the
> archiveentry map entry on the different templates so that it adds the current
> folder path to the archive links.
> 
> diff --git a/mercurial/hgweb/webcommands.py b/mercurial/hgweb/webcommands.py
> --- a/mercurial/hgweb/webcommands.py
> +++ b/mercurial/hgweb/webcommands.py
> @@ -816,6 +816,17 @@
>      if cnode == key or key == 'tip':
>          arch_version = short(cnode)
>      name = "%s-%s" % (reponame, arch_version)
> +
> +    ctx = webutil.changectx(web.repo, req)
> +    pats = []
> +    file = req.form.get('file', None)
> +    if file:
> +        file = file[0]
> +        if ':' in file:
> +            msg = 'Archive pattern not allowed: %s' % file
> +            raise ErrorResponse(HTTP_FORBIDDEN, msg)
> +        pats = ['path:' + file]

What about file with ":" is there name ?

> diff --git a/tests/test-archive.t b/tests/test-archive.t
> --- a/tests/test-archive.t
> +++ b/tests/test-archive.t
> @@ -100,6 +100,13 @@
>        testing: test-archive-2c0277f05ed4/baz/bletch   OK
>        testing: test-archive-2c0277f05ed4/foo   OK
>    No errors detected in compressed data of archive.zip.
> +  $ python getarchive.py "$TIP" gz baz | gunzip | tar tf - 2>/dev/null
> +  test-archive-2c0277f05ed4/baz/bletch

Is that the test for a directory of a file?

You should test the other too.

> +test that we reject unsafe patterns
> +
> +  $ python getarchive.py "$TIP" gz relre:baz
> +  HTTP Error 403: Archive pattern not allowed: relre:baz
>  
>    $ "$TESTDIR/killdaemons.py" $DAEMON_PIDS

Patches looks great otherwise!
Angel Ezquerra - March 12, 2013, 9:47 p.m.
On Tue, Mar 12, 2013 at 6:29 PM, Pierre-Yves David
<pierre-yves.david@logilab.fr> wrote:
> On Wed, Feb 27, 2013 at 06:06:33PM +0100, Angel Ezquerra wrote:
>> # HG changeset patch
>> # User Angel Ezquerra <angel.ezquerra@gmail.com>
>> # Date 1360493525 -3600
>> # Node ID b9fda7cbf2c239cecd667159391befb440edba80
>> # Parent  a1aab0b30144b6d9b383e8021d24eecd5861c84e
>> hgweb: teach archive how to download a specific directory or file
>>
>> The archive web command now takes into account the "file" request entry, if one
>> is provided.
>>
>> The provided "file" is processed as a "path" corresponding to a directory or
>> file that will be downloaded.
>>
>> With this change hgweb can to process requests such as:
>>
>>     http://mercurial.selenic.com/hg/archive/tip.zip/mercurial/templates
>>
>> This will download all files on the mercurial/templates directory as a zip file.
>> It is not possible to specify file patterns. Those will be rejected with a 403
>> reply fromthe server.
>>
>> Note that this is a first step to add support for downloading directories from
>> the web interface. Currently the only way to use this feature is by manually
>> constructing the URL that you want to download. We will have to modify the
>> archiveentry map entry on the different templates so that it adds the current
>> folder path to the archive links.
>>
>> diff --git a/mercurial/hgweb/webcommands.py b/mercurial/hgweb/webcommands.py
>> --- a/mercurial/hgweb/webcommands.py
>> +++ b/mercurial/hgweb/webcommands.py
>> @@ -816,6 +816,17 @@
>>      if cnode == key or key == 'tip':
>>          arch_version = short(cnode)
>>      name = "%s-%s" % (reponame, arch_version)
>> +
>> +    ctx = webutil.changectx(web.repo, req)
>> +    pats = []
>> +    file = req.form.get('file', None)
>> +    if file:
>> +        file = file[0]
>> +        if ':' in file:
>> +            msg = 'Archive pattern not allowed: %s' % file
>> +            raise ErrorResponse(HTTP_FORBIDDEN, msg)
>> +        pats = ['path:' + file]
>
> What about file with ":" is there name ?

I'd say that a user that created such a file deserves to not be able
to download it ;-)

Jokes aside, the alternative is to explicitly test for the different
patterns (e.g. relpath:, relre:, etc). The problem with that is that
if we ever introduce a new pattern we may forget to update this filter
and then let a malicious user use such a pattern.

So I don't know what is the best solution to this... Suggestions?


>> diff --git a/tests/test-archive.t b/tests/test-archive.t
>> --- a/tests/test-archive.t
>> +++ b/tests/test-archive.t
>> @@ -100,6 +100,13 @@
>>        testing: test-archive-2c0277f05ed4/baz/bletch   OK
>>        testing: test-archive-2c0277f05ed4/foo   OK
>>    No errors detected in compressed data of archive.zip.
>> +  $ python getarchive.py "$TIP" gz baz | gunzip | tar tf - 2>/dev/null
>> +  test-archive-2c0277f05ed4/baz/bletch
>
> Is that the test for a directory of a file?
>
> You should test the other too.

You are right. That was testing archiving a single file. I'll add
another test that archives a directory.

Good catch!

>
>> +test that we reject unsafe patterns
>> +
>> +  $ python getarchive.py "$TIP" gz relre:baz
>> +  HTTP Error 403: Archive pattern not allowed: relre:baz
>>
>>    $ "$TESTDIR/killdaemons.py" $DAEMON_PIDS
>
> Patches looks great otherwise!

Thanks for the review. Once we decide the best way to reject patterns
(if the current approach is not acceptable) I will resend the series
with the additional test.

Cheers,

Angel
Angel Ezquerra - March 12, 2013, 10:10 p.m.
On Tue, Mar 12, 2013 at 11:08 PM, Kevin Bullock
<kbullock+mercurial@ringworld.org> wrote:
> On 12 Mar 2013, at 4:47 PM, Angel Ezquerra wrote:
>
>> On Tue, Mar 12, 2013 at 6:29 PM, Pierre-Yves David
>> <pierre-yves.david@logilab.fr> wrote:
>>> On Wed, Feb 27, 2013 at 06:06:33PM +0100, Angel Ezquerra wrote:
>>>> +        if ':' in file:
>>>> +            msg = 'Archive pattern not allowed: %s' % file
>>>> +            raise ErrorResponse(HTTP_FORBIDDEN, msg)
>>>> +        pats = ['path:' + file]
>>>
>>> What about file with ":" is there name ?
>>
>> I'd say that a user that created such a file deserves to not be able
>> to download it ;-)
>>
>> Jokes aside, the alternative is to explicitly test for the different
>> patterns (e.g. relpath:, relre:, etc). The problem with that is that
>> if we ever introduce a new pattern we may forget to update this filter
>> and then let a malicious user use such a pattern.
>>
>> So I don't know what is the best solution to this... Suggestions?
>
> If you unconditionally prefix the filename with 'path:', it will be interpreted as a path (even if it contains a ':' later in the name). Then you'd presumably fall back on a 'file not found' error instead of the 403 Forbidden, which might even be preferable from a security standpoint.
>

OK, I will do as you suggest. Thanks!

Angel
Angel Ezquerra - March 12, 2013, 10:45 p.m.
On Tue, Mar 12, 2013 at 11:10 PM, Angel Ezquerra
<angel.ezquerra@gmail.com> wrote:
> On Tue, Mar 12, 2013 at 11:08 PM, Kevin Bullock
> <kbullock+mercurial@ringworld.org> wrote:
>> On 12 Mar 2013, at 4:47 PM, Angel Ezquerra wrote:
>>
>>> On Tue, Mar 12, 2013 at 6:29 PM, Pierre-Yves David
>>> <pierre-yves.david@logilab.fr> wrote:
>>>> On Wed, Feb 27, 2013 at 06:06:33PM +0100, Angel Ezquerra wrote:
>>>>> +        if ':' in file:
>>>>> +            msg = 'Archive pattern not allowed: %s' % file
>>>>> +            raise ErrorResponse(HTTP_FORBIDDEN, msg)
>>>>> +        pats = ['path:' + file]
>>>>
>>>> What about file with ":" is there name ?
>>>
>>> I'd say that a user that created such a file deserves to not be able
>>> to download it ;-)
>>>
>>> Jokes aside, the alternative is to explicitly test for the different
>>> patterns (e.g. relpath:, relre:, etc). The problem with that is that
>>> if we ever introduce a new pattern we may forget to update this filter
>>> and then let a malicious user use such a pattern.
>>>
>>> So I don't know what is the best solution to this... Suggestions?
>>
>> If you unconditionally prefix the filename with 'path:', it will be interpreted as a path (even if it contains a ':' later in the name). Then you'd presumably fall back on a 'file not found' error instead of the 403 Forbidden, which might even be preferable from a security standpoint.
>>
>
> OK, I will do as you suggest. Thanks!
>
> Angel

Ummm, actually that does not work that well because the server will
simply return an empty archive file.

In fact I was already unconditionally prepending 'path:' to the
requested file so I was wrong and even if in the future we add a new
pattern type that we forget to filter out there is no risk of a
malicious user doing something evil (he will just get an empty
archive).

For the rest of users I think it would be best to keep the 403 error
because it explains what is wrong explicitly.

So what if I just filter out the known patterns?

Angel
Angel Ezquerra - March 13, 2013, 8:34 a.m.
On Wed, Mar 13, 2013 at 3:40 AM, Kevin Bullock
<kbullock+mercurial@ringworld.org> wrote:
> On 12 Mar 2013, at 5:45 PM, Angel Ezquerra wrote:
>
>> On Tue, Mar 12, 2013 at 11:10 PM, Angel Ezquerra
>> <angel.ezquerra@gmail.com> wrote:
>>> On Tue, Mar 12, 2013 at 11:08 PM, Kevin Bullock
>>> <kbullock+mercurial@ringworld.org> wrote:
>>>> On 12 Mar 2013, at 4:47 PM, Angel Ezquerra wrote:
>>>>
>>>>> On Tue, Mar 12, 2013 at 6:29 PM, Pierre-Yves David
>>>>> <pierre-yves.david@logilab.fr> wrote:
>>>>>> On Wed, Feb 27, 2013 at 06:06:33PM +0100, Angel Ezquerra wrote:
>>>>>>> +        if ':' in file:
>>>>>>> +            msg = 'Archive pattern not allowed: %s' % file
>>>>>>> +            raise ErrorResponse(HTTP_FORBIDDEN, msg)
>>>>>>> +        pats = ['path:' + file]
>>>>>>
>>>>>> What about file with ":" is there name ?
>>>>>
>>>>> I'd say that a user that created such a file deserves to not be able
>>>>> to download it ;-)
>>>>>
>>>>> Jokes aside, the alternative is to explicitly test for the different
>>>>> patterns (e.g. relpath:, relre:, etc). The problem with that is that
>>>>> if we ever introduce a new pattern we may forget to update this filter
>>>>> and then let a malicious user use such a pattern.
>>>>>
>>>>> So I don't know what is the best solution to this... Suggestions?
>>>>
>>>> If you unconditionally prefix the filename with 'path:', it will be interpreted as a path (even if it contains a ':' later in the name). Then you'd presumably fall back on a 'file not found' error instead of the 403 Forbidden, which might even be preferable from a security standpoint.
>>>>
>>>
>>> OK, I will do as you suggest. Thanks!
>>>
>>> Angel
>>
>> Ummm, actually that does not work that well because the server will
>> simply return an empty archive file.
>
> That sounds like a bug.

Kevin,

I checked archival.archive() and it does not check whether the number
of files matching the pattern is 0.

I cannot tell if that is the desired behavior or a bug.

In any case I think that is something that, if needs fixing, should be
fixed on a separate patch?

Also, I personally like the fact that we give a 403 error. IMHO it
tells you exactly what is the problem without compromising the
security of the server because we append "path:" to the requested file
path anyway (unless there is some way to feed '\b' into the requested
path?).

So I'd vote to take the latest version of my patch series (V7!) which
explicitly checks for known patterns, and I would deal with the
possible archive.archival bug separately.

Cheers,

Angel
Pierre-Yves David - March 20, 2013, 2:38 p.m.
On Wed, Mar 13, 2013 at 09:34:08AM +0100, Angel Ezquerra wrote:
> I checked archival.archive() and it does not check whether the number
> of files matching the pattern is 0.
> 
> I cannot tell if that is the desired behavior or a bug.
> 
> In any case I think that is something that, if needs fixing, should be
> fixed on a separate patch?

That should probably be fixed. But this certainly deserve another
patches.

> Also, I personally like the fact that we give a 403 error. IMHO it
> tells you exactly what is the problem without compromising the
> security of the server because we append "path:" to the requested file
> path anyway (unless there is some way to feed '\b' into the requested
> path?).
> 
> So I'd vote to take the latest version of my patch series (V7!) which
> explicitly checks for known patterns, and I would deal with the
> possible archive.archival bug separately.

Black-listing security does not work. extension and futur version may
(will) add extra patterns ("et bim"). I vote for requesting plain file
name all the time. this means inconditionnaly adding "path:" in the
front of the parameter.
This will returns 404 on any pattern (including "path:" but plain
pattern are requested)
Angel Ezquerra - March 20, 2013, 3:20 p.m.
On Wed, Mar 20, 2013 at 3:38 PM, Pierre-Yves David
<pierre-yves.david@logilab.fr> wrote:
> On Wed, Mar 13, 2013 at 09:34:08AM +0100, Angel Ezquerra wrote:
>> I checked archival.archive() and it does not check whether the number
>> of files matching the pattern is 0.
>>
>> I cannot tell if that is the desired behavior or a bug.
>>
>> In any case I think that is something that, if needs fixing, should be
>> fixed on a separate patch?
>
> That should probably be fixed. But this certainly deserve another
> patches.

So should we raise an exception in that case?

>> Also, I personally like the fact that we give a 403 error. IMHO it
>> tells you exactly what is the problem without compromising the
>> security of the server because we append "path:" to the requested file
>> path anyway (unless there is some way to feed '\b' into the requested
>> path?).
>>
>> So I'd vote to take the latest version of my patch series (V7!) which
>> explicitly checks for known patterns, and I would deal with the
>> possible archive.archival bug separately.
>
> Black-listing security does not work. extension and futur version may
> (will) add extra patterns ("et bim"). I vote for requesting plain file
> name all the time. this means inconditionnaly adding "path:" in the
> front of the parameter.

I agree. In fact that is what the patch does already. This means that
if you try to use _any_ pattern (including "path") the web server will
respond with an error. The only difference is that if you try to use a
"known" pattern the server will respond with a nice 403 "Archive
pattern not allowed" message. If in the future a new pattern type is
added and this part of the code is not properly updated the web server
will return an empty archive (due to the archival.archive() bug
discussed above).

> This will returns 404 on any pattern (including "path:" but plain
> pattern are requested)

Not until the archival.archive() but is fixed.

BTW, I don't know if you noticed that this series has been crewed.

Cheers,

Angel
Angel Ezquerra - March 20, 2013, 3:34 p.m.
On Wed, Mar 20, 2013 at 4:29 PM, Kevin Bullock
<kbullock+mercurial@ringworld.org> wrote:
> On 20 Mar 2013, at 9:38 AM, Pierre-Yves David wrote:
>
>> On Wed, Mar 13, 2013 at 09:34:08AM +0100, Angel Ezquerra wrote:
>>> Also, I personally like the fact that we give a 403 error. IMHO it
>>> tells you exactly what is the problem without compromising the
>>> security of the server because we append "path:" to the requested file
>>> path anyway (unless there is some way to feed '\b' into the requested
>>> path?).
>>>
>>> So I'd vote to take the latest version of my patch series (V7!) which
>>> explicitly checks for known patterns, and I would deal with the
>>> possible archive.archival bug separately.
>>
>> Black-listing security does not work. extension and futur version may
>> (will) add extra patterns ("et bim"). I vote for requesting plain file
>> name all the time. this means inconditionnaly adding "path:" in the
>> front of the parameter.
>> This will returns 404 on any pattern (including "path:" but plain
>> pattern are requested)
>
> The blacklist is not a security mechanism. It's a nicety to make hgweb respond with a 403 Forbidden instead of an empty archive.
>
> Can we interrogate match.py for its known pattern types, and blacklist those automatically? Looks like it would require some refactoring in match.py, but it would save us some maintenance issues later.

I think this would be a very nice solution.

> We should also fix archive.archival to abort instead of creating an empty archive, and make hgweb return a 404 if it does.

Agreed.

Angel
Pierre-Yves David - March 20, 2013, 4:12 p.m.
On Wed, Mar 20, 2013 at 04:20:26PM +0100, Angel Ezquerra wrote:
> On Wed, Mar 20, 2013 at 3:38 PM, Pierre-Yves David
> <pierre-yves.david@logilab.fr> wrote:
> > On Wed, Mar 13, 2013 at 09:34:08AM +0100, Angel Ezquerra wrote:
> >> I checked archival.archive() and it does not check whether the number
> >> of files matching the pattern is 0.
> >>
> >> I cannot tell if that is the desired behavior or a bug.
> >>
> >> In any case I think that is something that, if needs fixing, should be
> >> fixed on a separate patch?
> >
> > That should probably be fixed. But this certainly deserve another
> > patches.
> 
> So should we raise an exception in that case?
> 
> >> Also, I personally like the fact that we give a 403 error. IMHO it
> >> tells you exactly what is the problem without compromising the
> >> security of the server because we append "path:" to the requested file
> >> path anyway (unless there is some way to feed '\b' into the requested
> >> path?).
> >>
> >> So I'd vote to take the latest version of my patch series (V7!) which
> >> explicitly checks for known patterns, and I would deal with the
> >> possible archive.archival bug separately.
> >
> > Black-listing security does not work. extension and futur version may
> > (will) add extra patterns ("et bim"). I vote for requesting plain file
> > name all the time. this means inconditionnaly adding "path:" in the
> > front of the parameter.
> 
> I agree. In fact that is what the patch does already. This means that
> if you try to use _any_ pattern (including "path") the web server will
> respond with an error. The only difference is that if you try to use a
> "known" pattern the server will respond with a nice 403 "Archive
> pattern not allowed" message.

And what if the repo contains a root directory whose name starts with
this very known pattern?

The nicety turn's into a big bad wolf :-(

I advertise a "simpler is better" route here. (But I'm won't die for it)
Angel Ezquerra - March 20, 2013, 5:09 p.m.
On Wed, Mar 20, 2013 at 5:21 PM, Kevin Bullock
<kbullock+mercurial@ringworld.org> wrote:
>
> On 20 Mar 2013, at 11:12 AM, Pierre-Yves David wrote:
>
>> On Wed, Mar 20, 2013 at 04:20:26PM +0100, Angel Ezquerra wrote:
>>> I agree. In fact that is what the patch does already. This means that
>>> if you try to use _any_ pattern (including "path") the web server will
>>> respond with an error. The only difference is that if you try to use a
>>> "known" pattern the server will respond with a nice 403 "Archive
>>> pattern not allowed" message.
>>
>> And what if the repo contains a root directory whose name starts with
>> this very known pattern?
>
> Do the same thing we do on the command line and let people explicitly prefix such names with 'path:'?

What if, rather than apending 'path:' to the requested path ourselves
we did _require_ that the requested path begins with 'path:'?

I think that would still let us check for invalid patterns while
fixing the issue that Pierre-Yves raised.

Cheers,

Angel

Patch

diff --git a/mercurial/hgweb/webcommands.py b/mercurial/hgweb/webcommands.py
--- a/mercurial/hgweb/webcommands.py
+++ b/mercurial/hgweb/webcommands.py
@@ -816,6 +816,17 @@ 
     if cnode == key or key == 'tip':
         arch_version = short(cnode)
     name = "%s-%s" % (reponame, arch_version)
+
+    ctx = webutil.changectx(web.repo, req)
+    pats = []
+    file = req.form.get('file', None)
+    if file:
+        file = file[0]
+        if ':' in file:
+            msg = 'Archive pattern not allowed: %s' % file
+            raise ErrorResponse(HTTP_FORBIDDEN, msg)
+        pats = ['path:' + file]
+
     mimetype, artype, extension, encoding = web.archive_specs[type_]
     headers = [
         ('Content-Disposition', 'attachment; filename=%s%s' % (name, extension))
@@ -825,9 +836,9 @@ 
     req.headers.extend(headers)
     req.respond(HTTP_OK, mimetype)
 
-    ctx = webutil.changectx(web.repo, req)
+    matchfn = scmutil.match(ctx, pats, default='path')
     archival.archive(web.repo, req, cnode, artype, prefix=name,
-                     matchfn=scmutil.match(ctx, []),
+                     matchfn=matchfn,
                      subrepos=web.configbool("web", "archivesubrepos"))
     return []
 
diff --git a/tests/test-archive.t b/tests/test-archive.t
--- a/tests/test-archive.t
+++ b/tests/test-archive.t
@@ -100,6 +100,13 @@ 
       testing: test-archive-2c0277f05ed4/baz/bletch   OK
       testing: test-archive-2c0277f05ed4/foo   OK
   No errors detected in compressed data of archive.zip.
+  $ python getarchive.py "$TIP" gz baz | gunzip | tar tf - 2>/dev/null
+  test-archive-2c0277f05ed4/baz/bletch
+
+test that we reject unsafe patterns
+
+  $ python getarchive.py "$TIP" gz relre:baz
+  HTTP Error 403: Archive pattern not allowed: relre:baz
 
   $ "$TESTDIR/killdaemons.py" $DAEMON_PIDS