Patchwork [3,of,4,py3,V2] pycompat: custom implementation of urllib.parse.quote()

login
register
mail settings
Submitter Gregory Szorc
Date March 13, 2017, 8:08 p.m.
Message ID <ae4c0d8f0ca7218c8b1d.1489435731@ubuntu-vm-main>
Download mbox | patch
Permalink /patch/19304/
State Accepted
Headers show

Comments

Gregory Szorc - March 13, 2017, 8:08 p.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1489432607 25200
#      Mon Mar 13 12:16:47 2017 -0700
# Node ID ae4c0d8f0ca7218c8b1d259b72660c99b4c085aa
# Parent  4dca07be3c215ee7f47ccd7473d78514968f1bb4
pycompat: custom implementation of urllib.parse.quote()

urllib.parse.quote() accepts either str or bytes and returns str.

There exists a urllib.parse.quote_from_bytes() which only accepts
bytes. We should probably use that to retain strong typing and
avoid surprises.

In addition, since nearly all strings in Mercurial are bytes, we
probably don't want quote() returning unicode.

So, this patch implements a custom quote() that only accepts bytes
and returns bytes. The quoted URL should only contain URL safe
characters which is a strict subset of ASCII. So
`.encode('ascii', 'strict')` should be safe.
Pulkit Goyal - March 14, 2017, 1:48 a.m.
On 13-Mar-2017 1:09 PM, "Gregory Szorc" <gregory.szorc@gmail.com> wrote:
>
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1489432607 25200
> #      Mon Mar 13 12:16:47 2017 -0700
> # Node ID ae4c0d8f0ca7218c8b1d259b72660c99b4c085aa
> # Parent  4dca07be3c215ee7f47ccd7473d78514968f1bb4
> pycompat: custom implementation of urllib.parse.quote()
>
> urllib.parse.quote() accepts either str or bytes and returns str.

> There exists a urllib.parse.quote_from_bytes() which only accepts
> bytes. We should probably use that to retain strong typing and
> avoid surprises.
>
> In addition, since nearly all strings in Mercurial are bytes, we
> probably don't want quote() returning unicode.
>
> So, this patch implements a custom quote() that only accepts bytes
> and returns bytes. The quoted URL should only contain URL safe
> characters which is a strict subset of ASCII. So
> `.encode('ascii', 'strict')` should be safe.
>
> diff --git a/mercurial/pycompat.py b/mercurial/pycompat.py
> --- a/mercurial/pycompat.py
> +++ b/mercurial/pycompat.py
> @@ -269,7 +269,6 @@ if not ispy3:
>  else:
>      import urllib.parse
>      urlreq._registeraliases(urllib.parse, (
> -        "quote",
>          "splitattr",
>          "splitpasswd",
>          "splitport",
> @@ -313,3 +312,12 @@ else:
>          "SimpleHTTPRequestHandler",
>          "CGIHTTPRequestHandler",
>      ))
> +
> +    # urllib.parse.quote() accepts both str and bytes, decodes bytes
> +    # (if necessary), and returns str. This is wonky. We provide a custom
> +    # implementation that only accepts bytes and emits bytes.
> +    def quote(s, safe=r'/'):
> +        s = urllib.parse.quote_from_bytes(s, safe=safe)
> +        return s.encode('ascii', 'strict')
> +
> +    urlreq.quote = quote
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Pulkit Goyal - March 14, 2017, 1:51 a.m.
On 13-Mar-2017 6:48 PM, "Pulkit Goyal" <7895pulkit@gmail.com> wrote:
>
>
> On 13-Mar-2017 1:09 PM, "Gregory Szorc" <gregory.szorc@gmail.com> wrote:
> >
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc@gmail.com>
> > # Date 1489432607 25200
> > #      Mon Mar 13 12:16:47 2017 -0700
> > # Node ID ae4c0d8f0ca7218c8b1d259b72660c99b4c085aa
> > # Parent  4dca07be3c215ee7f47ccd7473d78514968f1bb4
> > pycompat: custom implementation of urllib.parse.quote()

Wow! Yesterday I saw this error and decided to fix this later. I didn't
knew 'hg init' was just behind this. Its great we have now init working.

PS: Sorry for last reply, I am on phone and I accidentally clicked on the
send button.
Yuya Nishihara - March 14, 2017, 8:43 a.m.
On Mon, 13 Mar 2017 13:08:51 -0700, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1489432607 25200
> #      Mon Mar 13 12:16:47 2017 -0700
> # Node ID ae4c0d8f0ca7218c8b1d259b72660c99b4c085aa
> # Parent  4dca07be3c215ee7f47ccd7473d78514968f1bb4
> pycompat: custom implementation of urllib.parse.quote()
> 
> urllib.parse.quote() accepts either str or bytes and returns str.
> 
> There exists a urllib.parse.quote_from_bytes() which only accepts
> bytes. We should probably use that to retain strong typing and
> avoid surprises.
> 
> In addition, since nearly all strings in Mercurial are bytes, we
> probably don't want quote() returning unicode.
> 
> So, this patch implements a custom quote() that only accepts bytes
> and returns bytes. The quoted URL should only contain URL safe
> characters which is a strict subset of ASCII. So
> `.encode('ascii', 'strict')` should be safe.
> 
> diff --git a/mercurial/pycompat.py b/mercurial/pycompat.py
> --- a/mercurial/pycompat.py
> +++ b/mercurial/pycompat.py
> @@ -269,7 +269,6 @@ if not ispy3:
>  else:
>      import urllib.parse
>      urlreq._registeraliases(urllib.parse, (
> -        "quote",
>          "splitattr",
>          "splitpasswd",
>          "splitport",
> @@ -313,3 +312,12 @@ else:
>          "SimpleHTTPRequestHandler",
>          "CGIHTTPRequestHandler",
>      ))
> +
> +    # urllib.parse.quote() accepts both str and bytes, decodes bytes
> +    # (if necessary), and returns str. This is wonky. We provide a custom
> +    # implementation that only accepts bytes and emits bytes.
> +    def quote(s, safe=r'/'):
> +        s = urllib.parse.quote_from_bytes(s, safe=safe)
> +        return s.encode('ascii', 'strict')
> +
> +    urlreq.quote = quote

unquote() is exported as pycompat.urlunquote(). Maybe we should choose one
way.
Gregory Szorc - March 22, 2017, 6 a.m.
On Tue, Mar 14, 2017 at 1:43 AM, Yuya Nishihara <yuya@tcha.org> wrote:

> On Mon, 13 Mar 2017 13:08:51 -0700, Gregory Szorc wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc@gmail.com>
> > # Date 1489432607 25200
> > #      Mon Mar 13 12:16:47 2017 -0700
> > # Node ID ae4c0d8f0ca7218c8b1d259b72660c99b4c085aa
> > # Parent  4dca07be3c215ee7f47ccd7473d78514968f1bb4
> > pycompat: custom implementation of urllib.parse.quote()
> >
> > urllib.parse.quote() accepts either str or bytes and returns str.
> >
> > There exists a urllib.parse.quote_from_bytes() which only accepts
> > bytes. We should probably use that to retain strong typing and
> > avoid surprises.
> >
> > In addition, since nearly all strings in Mercurial are bytes, we
> > probably don't want quote() returning unicode.
> >
> > So, this patch implements a custom quote() that only accepts bytes
> > and returns bytes. The quoted URL should only contain URL safe
> > characters which is a strict subset of ASCII. So
> > `.encode('ascii', 'strict')` should be safe.
> >
> > diff --git a/mercurial/pycompat.py b/mercurial/pycompat.py
> > --- a/mercurial/pycompat.py
> > +++ b/mercurial/pycompat.py
> > @@ -269,7 +269,6 @@ if not ispy3:
> >  else:
> >      import urllib.parse
> >      urlreq._registeraliases(urllib.parse, (
> > -        "quote",
> >          "splitattr",
> >          "splitpasswd",
> >          "splitport",
> > @@ -313,3 +312,12 @@ else:
> >          "SimpleHTTPRequestHandler",
> >          "CGIHTTPRequestHandler",
> >      ))
> > +
> > +    # urllib.parse.quote() accepts both str and bytes, decodes bytes
> > +    # (if necessary), and returns str. This is wonky. We provide a
> custom
> > +    # implementation that only accepts bytes and emits bytes.
> > +    def quote(s, safe=r'/'):
> > +        s = urllib.parse.quote_from_bytes(s, safe=safe)
> > +        return s.encode('ascii', 'strict')
> > +
> > +    urlreq.quote = quote
>
> unquote() is exported as pycompat.urlunquote(). Maybe we should choose one
> way.
>

I thought this change was trivial. But it turned into an 8 part series
which I just patchpombed. Thanks for spotting the redundancy.

Patch

diff --git a/mercurial/pycompat.py b/mercurial/pycompat.py
--- a/mercurial/pycompat.py
+++ b/mercurial/pycompat.py
@@ -269,7 +269,6 @@  if not ispy3:
 else:
     import urllib.parse
     urlreq._registeraliases(urllib.parse, (
-        "quote",
         "splitattr",
         "splitpasswd",
         "splitport",
@@ -313,3 +312,12 @@  else:
         "SimpleHTTPRequestHandler",
         "CGIHTTPRequestHandler",
     ))
+
+    # urllib.parse.quote() accepts both str and bytes, decodes bytes
+    # (if necessary), and returns str. This is wonky. We provide a custom
+    # implementation that only accepts bytes and emits bytes.
+    def quote(s, safe=r'/'):
+        s = urllib.parse.quote_from_bytes(s, safe=safe)
+        return s.encode('ascii', 'strict')
+
+    urlreq.quote = quote