Patchwork templatefilters: don't escape <> in JSON

login
register
mail settings
Submitter Gregory Szorc
Date Jan. 16, 2015, 5:04 a.m.
Message ID <a07b22eefd8e4c629b73.1421384681@3.1.168.192.in-addr.arpa>
Download mbox | patch
Permalink /patch/7487/
State Rejected
Delegated to: Matt Mackall
Headers show

Comments

Gregory Szorc - Jan. 16, 2015, 5:04 a.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1421384385 28800
#      Thu Jan 15 20:59:45 2015 -0800
# Node ID a07b22eefd8e4c629b739778b3ca5f3d53a8b1de
# Parent  049a9e3a078d7c988cb12ed456aad6ec2779ea69
templatefilters: don't escape <> in JSON

55c763926a28 added escaping of "<" and ">" in JSON. I could not find any
specification claiming that these are special characters that need to be
escaped. Furthermore, feeding these characters through both Python's and
SpiderMonkey's JSON serialization API revealed no escaping.
Pierre-Yves David - Jan. 16, 2015, 8:48 a.m.
On 01/15/2015 09:04 PM, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1421384385 28800
> #      Thu Jan 15 20:59:45 2015 -0800
> # Node ID a07b22eefd8e4c629b739778b3ca5f3d53a8b1de
> # Parent  049a9e3a078d7c988cb12ed456aad6ec2779ea69
> templatefilters: don't escape <> in JSON
>
> 55c763926a28 added escaping of "<" and ">" in JSON. I could not find any
> specification claiming that these are special characters that need to be
> escaped. Furthermore, feeding these characters through both Python's and
> SpiderMonkey's JSON serialization API revealed no escaping.

CCing matt as he is the paranoid who did this. Tests or more extensible 
explanation would be cool.

>
> diff --git a/mercurial/templatefilters.py b/mercurial/templatefilters.py
> --- a/mercurial/templatefilters.py
> +++ b/mercurial/templatefilters.py
> @@ -221,9 +221,9 @@ def _uescape(c):
>
>   _escapes = [
>       ('\\', '\\\\'), ('"', '\\"'), ('\t', '\\t'), ('\n', '\\n'),
>       ('\r', '\\r'), ('\f', '\\f'), ('\b', '\\b'),
> -    ('<', '\\u003c'), ('>', '\\u003e'), ('\0', '\\u0000')
> +    ('\0', '\\u0000'),
>   ]
>
>   def jsonescape(s):
>       for k, v in _escapes:
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
>
Augie Fackler - Jan. 16, 2015, 2:58 p.m.
On Fri, Jan 16, 2015 at 12:48:15AM -0800, Pierre-Yves David wrote:
>
>
> On 01/15/2015 09:04 PM, Gregory Szorc wrote:
> ># HG changeset patch
> ># User Gregory Szorc <gregory.szorc@gmail.com>
> ># Date 1421384385 28800
> >#      Thu Jan 15 20:59:45 2015 -0800
> ># Node ID a07b22eefd8e4c629b739778b3ca5f3d53a8b1de
> ># Parent  049a9e3a078d7c988cb12ed456aad6ec2779ea69
> >templatefilters: don't escape <> in JSON
> >
> >55c763926a28 added escaping of "<" and ">" in JSON. I could not find any
> >specification claiming that these are special characters that need to be
> >escaped. Furthermore, feeding these characters through both Python's and
> >SpiderMonkey's JSON serialization API revealed no escaping.
>
> CCing matt as he is the paranoid who did this. Tests or more extensible
> explanation would be cool.

I believe this defends against certain types of XSS attacks. This[0]
site reinforces that belief. I'm not sure of any specific modern
browsers this defends, but I'm happy to retain this escaping.

0: https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#Output_Encoding_Rules_Summary



>
> >
> >diff --git a/mercurial/templatefilters.py b/mercurial/templatefilters.py
> >--- a/mercurial/templatefilters.py
> >+++ b/mercurial/templatefilters.py
> >@@ -221,9 +221,9 @@ def _uescape(c):
> >
> >  _escapes = [
> >      ('\\', '\\\\'), ('"', '\\"'), ('\t', '\\t'), ('\n', '\\n'),
> >      ('\r', '\\r'), ('\f', '\\f'), ('\b', '\\b'),
> >-    ('<', '\\u003c'), ('>', '\\u003e'), ('\0', '\\u0000')
> >+    ('\0', '\\u0000'),
> >  ]
> >
> >  def jsonescape(s):
> >      for k, v in _escapes:
> >_______________________________________________
> >Mercurial-devel mailing list
> >Mercurial-devel@selenic.com
> >http://selenic.com/mailman/listinfo/mercurial-devel
> >
>
> --
> Pierre-Yves David
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
Gregory Szorc - Jan. 16, 2015, 5:39 p.m.
On Fri, Jan 16, 2015 at 6:58 AM, Augie Fackler <raf@durin42.com> wrote:

> On Fri, Jan 16, 2015 at 12:48:15AM -0800, Pierre-Yves David wrote:
> >
> >
> > On 01/15/2015 09:04 PM, Gregory Szorc wrote:
> > ># HG changeset patch
> > ># User Gregory Szorc <gregory.szorc@gmail.com>
> > ># Date 1421384385 28800
> > >#      Thu Jan 15 20:59:45 2015 -0800
> > ># Node ID a07b22eefd8e4c629b739778b3ca5f3d53a8b1de
> > ># Parent  049a9e3a078d7c988cb12ed456aad6ec2779ea69
> > >templatefilters: don't escape <> in JSON
> > >
> > >55c763926a28 added escaping of "<" and ">" in JSON. I could not find any
> > >specification claiming that these are special characters that need to be
> > >escaped. Furthermore, feeding these characters through both Python's and
> > >SpiderMonkey's JSON serialization API revealed no escaping.
> >
> > CCing matt as he is the paranoid who did this. Tests or more extensible
> > explanation would be cool.
>
> I believe this defends against certain types of XSS attacks. This[0]
> site reinforces that belief. I'm not sure of any specific modern
> browsers this defends, but I'm happy to retain this escaping.
>
> 0:
> https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#Output_Encoding_Rules_Summary
>


I'm pretty sure if you are doing an e.g. .innerHTML = <output from
Mercurial>, you are going to get XSS no matter what.

The \uXXXX encoding here does nothing more than change the representation
in the JSON serialization: anything that parses it will normalize \u003c to
<. As evidenced by SpiderMonkey:

    js> JSON.parse('"\\u003c\\u003e"')
    "<>"

FTR, I've seen some insanely weird JSON encoders that perform the \u
escaping on all non-reserved characters, even those in the ASCII range.
It's perfectly valid JSON. It comes out as ASCII on the other end:

    js> JSON.parse('"\\u0068\\u0065\\u006c\\u006c\\u006f"')
    "hello"

If you care about preventing XSS, you need to HTML escape content from
JSON. Period. The alternative would be to HTML escape strings before they
go into the JSON payload. But that would be a violation because Mercurial's
JSON is not always destined for HTML. Mercurial's JSON should be "pure" and
target agnostic. In other words, it should follow the spec (RFC 4627).
Augie Fackler - Jan. 16, 2015, 5:41 p.m.
On Jan 16, 2015, at 12:39 PM, Gregory Szorc <gregory.szorc@gmail.com> wrote:

> On Fri, Jan 16, 2015 at 6:58 AM, Augie Fackler <raf@durin42.com> wrote:
> On Fri, Jan 16, 2015 at 12:48:15AM -0800, Pierre-Yves David wrote:
> >
> >
> > On 01/15/2015 09:04 PM, Gregory Szorc wrote:
> > ># HG changeset patch
> > ># User Gregory Szorc <gregory.szorc@gmail.com>
> > ># Date 1421384385 28800
> > >#      Thu Jan 15 20:59:45 2015 -0800
> > ># Node ID a07b22eefd8e4c629b739778b3ca5f3d53a8b1de
> > ># Parent  049a9e3a078d7c988cb12ed456aad6ec2779ea69
> > >templatefilters: don't escape <> in JSON
> > >
> > >55c763926a28 added escaping of "<" and ">" in JSON. I could not find any
> > >specification claiming that these are special characters that need to be
> > >escaped. Furthermore, feeding these characters through both Python's and
> > >SpiderMonkey's JSON serialization API revealed no escaping.
> >
> > CCing matt as he is the paranoid who did this. Tests or more extensible
> > explanation would be cool.
> 
> I believe this defends against certain types of XSS attacks. This[0]
> site reinforces that belief. I'm not sure of any specific modern
> browsers this defends, but I'm happy to retain this escaping.
> 
> 0: https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#Output_Encoding_Rules_Summary
> 
> 
> I'm pretty sure if you are doing an e.g. .innerHTML = <output from Mercurial>, you are going to get XSS no matter what.
> 
> The \uXXXX encoding here does nothing more than change the representation in the JSON serialization: anything that parses it will normalize \u003c to <. As evidenced by SpiderMonkey:
> 
>     js> JSON.parse('"\\u003c\\u003e"')
>     "<>"
> 
> FTR, I've seen some insanely weird JSON encoders that perform the \u escaping on all non-reserved characters, even those in the ASCII range. It's perfectly valid JSON. It comes out as ASCII on the other end:
> 
>     js> JSON.parse('"\\u0068\\u0065\\u006c\\u006c\\u006f"')
>     "hello"

Yes, after it's parsed. I believe the owasp guidelines (and similar ones I've seen internally at Google) are around people doing stupid stuff with the raw un-parsed json.

> 
> If you care about preventing XSS, you need to HTML escape content from JSON. Period. The alternative would be to HTML escape strings before they go into the JSON payload. But that would be a violation because Mercurial's JSON is not always destined for HTML. Mercurial's JSON should be "pure" and target agnostic. In other words, it should follow the spec (RFC 4627).
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
Matt Mackall - Jan. 16, 2015, 7:20 p.m.
On Thu, 2015-01-15 at 21:04 -0800, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1421384385 28800
> #      Thu Jan 15 20:59:45 2015 -0800
> # Node ID a07b22eefd8e4c629b739778b3ca5f3d53a8b1de
> # Parent  049a9e3a078d7c988cb12ed456aad6ec2779ea69
> templatefilters: don't escape <> in JSON
> 
> 55c763926a28 added escaping of "<" and ">" in JSON. I could not find any
> specification claiming that these are special characters that need to be
> escaped. Furthermore, feeding these characters through both Python's and
> SpiderMonkey's JSON serialization API revealed no escaping.

The original patch was not technically paranoid because I already had an
hgweb exploit. Discovered it somewhere while working through the first
dozen levels of this:

http://escape.alf.nu/
Gregory Szorc - Jan. 16, 2015, 7:24 p.m.
On 1/16/15 11:20 AM, Matt Mackall wrote:
> On Thu, 2015-01-15 at 21:04 -0800, Gregory Szorc wrote:
>> # HG changeset patch
>> # User Gregory Szorc <gregory.szorc@gmail.com>
>> # Date 1421384385 28800
>> #      Thu Jan 15 20:59:45 2015 -0800
>> # Node ID a07b22eefd8e4c629b739778b3ca5f3d53a8b1de
>> # Parent  049a9e3a078d7c988cb12ed456aad6ec2779ea69
>> templatefilters: don't escape <> in JSON
>>
>> 55c763926a28 added escaping of "<" and ">" in JSON. I could not find any
>> specification claiming that these are special characters that need to be
>> escaped. Furthermore, feeding these characters through both Python's and
>> SpiderMonkey's JSON serialization API revealed no escaping.
>
> The original patch was not technically paranoid because I already had an
> hgweb exploit. Discovered it somewhere while working through the first
> dozen levels of this:
>
> http://escape.alf.nu/

Well, then I argue it's a failure in hgweb to escape content from JSON 
entities.

It would be a general issue if this set of escapes applied to other 
encoders (like HTML). But I could find no such use.
Pierre-Yves David - March 4, 2015, 11:37 p.m.
On 01/16/2015 11:24 AM, Gregory Szorc wrote:
> On 1/16/15 11:20 AM, Matt Mackall wrote:
>> On Thu, 2015-01-15 at 21:04 -0800, Gregory Szorc wrote:
>>> # HG changeset patch
>>> # User Gregory Szorc <gregory.szorc@gmail.com>
>>> # Date 1421384385 28800
>>> #      Thu Jan 15 20:59:45 2015 -0800
>>> # Node ID a07b22eefd8e4c629b739778b3ca5f3d53a8b1de
>>> # Parent  049a9e3a078d7c988cb12ed456aad6ec2779ea69
>>> templatefilters: don't escape <> in JSON
>>>
>>> 55c763926a28 added escaping of "<" and ">" in JSON. I could not find any
>>> specification claiming that these are special characters that need to be
>>> escaped. Furthermore, feeding these characters through both Python's and
>>> SpiderMonkey's JSON serialization API revealed no escaping.
>>
>> The original patch was not technically paranoid because I already had an
>> hgweb exploit. Discovered it somewhere while working through the first
>> dozen levels of this:
>>
>> http://escape.alf.nu/
>
> Well, then I argue it's a failure in hgweb to escape content from JSON
> entities.
>
> It would be a general issue if this set of escapes applied to other
> encoders (like HTML). But I could find no such use.

So, what is the status of this. Matt are you convinced by Greg argument 
(so we take the patch) or should we drop it. If we drop it do we need 
something else?
Matt Mackall - March 5, 2015, 6:36 p.m.
On Wed, 2015-03-04 at 15:37 -0800, Pierre-Yves David wrote:
> 
> On 01/16/2015 11:24 AM, Gregory Szorc wrote:
> > On 1/16/15 11:20 AM, Matt Mackall wrote:
> >> On Thu, 2015-01-15 at 21:04 -0800, Gregory Szorc wrote:
> >>> # HG changeset patch
> >>> # User Gregory Szorc <gregory.szorc@gmail.com>
> >>> # Date 1421384385 28800
> >>> #      Thu Jan 15 20:59:45 2015 -0800
> >>> # Node ID a07b22eefd8e4c629b739778b3ca5f3d53a8b1de
> >>> # Parent  049a9e3a078d7c988cb12ed456aad6ec2779ea69
> >>> templatefilters: don't escape <> in JSON
> >>>
> >>> 55c763926a28 added escaping of "<" and ">" in JSON. I could not find any
> >>> specification claiming that these are special characters that need to be
> >>> escaped. Furthermore, feeding these characters through both Python's and
> >>> SpiderMonkey's JSON serialization API revealed no escaping.
> >>
> >> The original patch was not technically paranoid because I already had an
> >> hgweb exploit. Discovered it somewhere while working through the first
> >> dozen levels of this:
> >>
> >> http://escape.alf.nu/
> >
> > Well, then I argue it's a failure in hgweb to escape content from JSON
> > entities.
> >
> > It would be a general issue if this set of escapes applied to other
> > encoders (like HTML). But I could find no such use.
> 
> So, what is the status of this. Matt are you convinced by Greg argument 
> (so we take the patch) or should we drop it. If we drop it do we need 
> something else?

I dropped it ages ago.

I know there's (multiple) exploits here, so obviously the patch can't go
in by itself.

I also know that the known exploits are not going to be obvious to
future template writers and reviewers, and there are probably some
unknown exploits we'll be defended from as well. As the JSON spec allows
escaping any and all characters, there's really no good argument for
removing the safety net here.

(FWIW, Facebook escapes the letter 'e' in their back-end to make it
dead-obvious if some data path hasn't been sanitized properly.)

Patch

diff --git a/mercurial/templatefilters.py b/mercurial/templatefilters.py
--- a/mercurial/templatefilters.py
+++ b/mercurial/templatefilters.py
@@ -221,9 +221,9 @@  def _uescape(c):
 
 _escapes = [
     ('\\', '\\\\'), ('"', '\\"'), ('\t', '\\t'), ('\n', '\\n'),
     ('\r', '\\r'), ('\f', '\\f'), ('\b', '\\b'),
-    ('<', '\\u003c'), ('>', '\\u003e'), ('\0', '\\u0000')
+    ('\0', '\\u0000'),
 ]
 
 def jsonescape(s):
     for k, v in _escapes: