Patchwork [2,of,2,py3] pycompat: custom implementation of urllib.parse.quote()

login
register
mail settings
Submitter Gregory Szorc
Date March 13, 2017, 7:27 p.m.
Message ID <e6b177a3a7662366e8af.1489433243@ubuntu-vm-main>
Download mbox | patch
Permalink /patch/19299/
State Accepted
Headers show

Comments

Gregory Szorc - March 13, 2017, 7:27 p.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1489432607 25200
#      Mon Mar 13 12:16:47 2017 -0700
# Node ID e6b177a3a7662366e8af0ba07ccbd9509a8cc647
# Parent  2b9547ebdfa84c3e96fd366e3c09dd24306747d4
pycompat: custom implementation of urllib.parse.quote()

urllib.parse.quote() accepts either str or bytes and returns str.

There exists a urllib.parse.quote_from_bytes() which only accepts
bytes. We should probably use that to retain strong typing and
avoid surprises.

In addition, since nearly all strings in Mercurial are bytes, we
probably don't want quote() returning unicode.

So, this patch implements a custom quote() that only accepts bytes
and returns bytes. The quoted URL should only contain URL safe
characters which is a strict subset of ASCII. So
`.encode('ascii', 'strict')` should be safe.

After this patch, `hg init <path>` works on Python 3.5!
Augie Fackler - March 21, 2017, 9:44 p.m.
On Mon, Mar 13, 2017 at 12:27:23PM -0700, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1489432607 25200
> #      Mon Mar 13 12:16:47 2017 -0700
> # Node ID e6b177a3a7662366e8af0ba07ccbd9509a8cc647
> # Parent  2b9547ebdfa84c3e96fd366e3c09dd24306747d4
> pycompat: custom implementation of urllib.parse.quote()

Queued these, thanks.

>
> urllib.parse.quote() accepts either str or bytes and returns str.
>
> There exists a urllib.parse.quote_from_bytes() which only accepts
> bytes. We should probably use that to retain strong typing and
> avoid surprises.
>
> In addition, since nearly all strings in Mercurial are bytes, we
> probably don't want quote() returning unicode.
>
> So, this patch implements a custom quote() that only accepts bytes
> and returns bytes. The quoted URL should only contain URL safe
> characters which is a strict subset of ASCII. So
> `.encode('ascii', 'strict')` should be safe.
>
> After this patch, `hg init <path>` works on Python 3.5!
>
> diff --git a/mercurial/pycompat.py b/mercurial/pycompat.py
> --- a/mercurial/pycompat.py
> +++ b/mercurial/pycompat.py
> @@ -269,7 +269,6 @@ if not ispy3:
>  else:
>      import urllib.parse
>      urlreq._registeraliases(urllib.parse, (
> -        "quote",
>          "splitattr",
>          "splitpasswd",
>          "splitport",
> @@ -313,3 +312,12 @@ else:
>          "SimpleHTTPRequestHandler",
>          "CGIHTTPRequestHandler",
>      ))
> +
> +    # urllib.parse.quote() accepts both str and bytes, decodes bytes
> +    # (if necessary), and returns str. This is wonky. We provide a custom
> +    # implementation that only accepts bytes and emits bytes.
> +    def quote(s, safe=r'/'):
> +        s = urllib.parse.quote_from_bytes(s, safe=safe)
> +        return s.encode('ascii', 'strict')
> +
> +    urlreq.quote = quote
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Augie Fackler - March 21, 2017, 9:44 p.m.
On Mon, Mar 13, 2017 at 12:27:23PM -0700, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1489432607 25200
> #      Mon Mar 13 12:16:47 2017 -0700
> # Node ID e6b177a3a7662366e8af0ba07ccbd9509a8cc647
> # Parent  2b9547ebdfa84c3e96fd366e3c09dd24306747d4
> pycompat: custom implementation of urllib.parse.quote()

Bah, these no longer apply. Can you rebase and resend? (cc me for
faster turnaround so they don't bitrot again.)

>
> urllib.parse.quote() accepts either str or bytes and returns str.
>
> There exists a urllib.parse.quote_from_bytes() which only accepts
> bytes. We should probably use that to retain strong typing and
> avoid surprises.
>
> In addition, since nearly all strings in Mercurial are bytes, we
> probably don't want quote() returning unicode.
>
> So, this patch implements a custom quote() that only accepts bytes
> and returns bytes. The quoted URL should only contain URL safe
> characters which is a strict subset of ASCII. So
> `.encode('ascii', 'strict')` should be safe.
>
> After this patch, `hg init <path>` works on Python 3.5!
>
> diff --git a/mercurial/pycompat.py b/mercurial/pycompat.py
> --- a/mercurial/pycompat.py
> +++ b/mercurial/pycompat.py
> @@ -269,7 +269,6 @@ if not ispy3:
>  else:
>      import urllib.parse
>      urlreq._registeraliases(urllib.parse, (
> -        "quote",
>          "splitattr",
>          "splitpasswd",
>          "splitport",
> @@ -313,3 +312,12 @@ else:
>          "SimpleHTTPRequestHandler",
>          "CGIHTTPRequestHandler",
>      ))
> +
> +    # urllib.parse.quote() accepts both str and bytes, decodes bytes
> +    # (if necessary), and returns str. This is wonky. We provide a custom
> +    # implementation that only accepts bytes and emits bytes.
> +    def quote(s, safe=r'/'):
> +        s = urllib.parse.quote_from_bytes(s, safe=safe)
> +        return s.encode('ascii', 'strict')
> +
> +    urlreq.quote = quote
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Gregory Szorc - March 22, 2017, 4:52 a.m.
On Tue, Mar 21, 2017 at 2:44 PM, Augie Fackler <raf@durin42.com> wrote:

> On Mon, Mar 13, 2017 at 12:27:23PM -0700, Gregory Szorc wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc@gmail.com>
> > # Date 1489432607 25200
> > #      Mon Mar 13 12:16:47 2017 -0700
> > # Node ID e6b177a3a7662366e8af0ba07ccbd9509a8cc647
> > # Parent  2b9547ebdfa84c3e96fd366e3c09dd24306747d4
> > pycompat: custom implementation of urllib.parse.quote()
>
> Bah, these no longer apply. Can you rebase and resend? (cc me for
> faster turnaround so they don't bitrot again.)
>

I'm pretty sure that's because they are already published as 1ed169c5e235
and fb1f70331ee6.


>
> >
> > urllib.parse.quote() accepts either str or bytes and returns str.
> >
> > There exists a urllib.parse.quote_from_bytes() which only accepts
> > bytes. We should probably use that to retain strong typing and
> > avoid surprises.
> >
> > In addition, since nearly all strings in Mercurial are bytes, we
> > probably don't want quote() returning unicode.
> >
> > So, this patch implements a custom quote() that only accepts bytes
> > and returns bytes. The quoted URL should only contain URL safe
> > characters which is a strict subset of ASCII. So
> > `.encode('ascii', 'strict')` should be safe.
> >
> > After this patch, `hg init <path>` works on Python 3.5!
> >
> > diff --git a/mercurial/pycompat.py b/mercurial/pycompat.py
> > --- a/mercurial/pycompat.py
> > +++ b/mercurial/pycompat.py
> > @@ -269,7 +269,6 @@ if not ispy3:
> >  else:
> >      import urllib.parse
> >      urlreq._registeraliases(urllib.parse, (
> > -        "quote",
> >          "splitattr",
> >          "splitpasswd",
> >          "splitport",
> > @@ -313,3 +312,12 @@ else:
> >          "SimpleHTTPRequestHandler",
> >          "CGIHTTPRequestHandler",
> >      ))
> > +
> > +    # urllib.parse.quote() accepts both str and bytes, decodes bytes
> > +    # (if necessary), and returns str. This is wonky. We provide a
> custom
> > +    # implementation that only accepts bytes and emits bytes.
> > +    def quote(s, safe=r'/'):
> > +        s = urllib.parse.quote_from_bytes(s, safe=safe)
> > +        return s.encode('ascii', 'strict')
> > +
> > +    urlreq.quote = quote
> > _______________________________________________
> > Mercurial-devel mailing list
> > Mercurial-devel@mercurial-scm.org
> > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
Augie Fackler - March 22, 2017, 5:42 a.m.
On Mar 22, 2017 12:52 AM, "Gregory Szorc" <gregory.szorc@gmail.com> wrote:

On Tue, Mar 21, 2017 at 2:44 PM, Augie Fackler <raf@durin42.com> wrote:

> On Mon, Mar 13, 2017 at 12:27:23PM -0700, Gregory Szorc wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc@gmail.com>
> > # Date 1489432607 25200
> > #      Mon Mar 13 12:16:47 2017 -0700
> > # Node ID e6b177a3a7662366e8af0ba07ccbd9509a8cc647
> > # Parent  2b9547ebdfa84c3e96fd366e3c09dd24306747d4
> > pycompat: custom implementation of urllib.parse.quote()
>
> Bah, these no longer apply. Can you rebase and resend? (cc me for
> faster turnaround so they don't bitrot again.)
>

I'm pretty sure that's because they are already published as 1ed169c5e235
and fb1f70331ee6.



Well that'd do it. Sigh.


> >
> > urllib.parse.quote() accepts either str or bytes and returns str.
> >
> > There exists a urllib.parse.quote_from_bytes() which only accepts
> > bytes. We should probably use that to retain strong typing and
> > avoid surprises.
> >
> > In addition, since nearly all strings in Mercurial are bytes, we
> > probably don't want quote() returning unicode.
> >
> > So, this patch implements a custom quote() that only accepts bytes
> > and returns bytes. The quoted URL should only contain URL safe
> > characters which is a strict subset of ASCII. So
> > `.encode('ascii', 'strict')` should be safe.
> >
> > After this patch, `hg init <path>` works on Python 3.5!
> >
> > diff --git a/mercurial/pycompat.py b/mercurial/pycompat.py
> > --- a/mercurial/pycompat.py
> > +++ b/mercurial/pycompat.py
> > @@ -269,7 +269,6 @@ if not ispy3:
> >  else:
> >      import urllib.parse
> >      urlreq._registeraliases(urllib.parse, (
> > -        "quote",
> >          "splitattr",
> >          "splitpasswd",
> >          "splitport",
> > @@ -313,3 +312,12 @@ else:
> >          "SimpleHTTPRequestHandler",
> >          "CGIHTTPRequestHandler",
> >      ))
> > +
> > +    # urllib.parse.quote() accepts both str and bytes, decodes bytes
> > +    # (if necessary), and returns str. This is wonky. We provide a
> custom
> > +    # implementation that only accepts bytes and emits bytes.
> > +    def quote(s, safe=r'/'):
> > +        s = urllib.parse.quote_from_bytes(s, safe=safe)
> > +        return s.encode('ascii', 'strict')
> > +
> > +    urlreq.quote = quote
> > _______________________________________________
> > Mercurial-devel mailing list
> > Mercurial-devel@mercurial-scm.org
> > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>

Patch

diff --git a/mercurial/pycompat.py b/mercurial/pycompat.py
--- a/mercurial/pycompat.py
+++ b/mercurial/pycompat.py
@@ -269,7 +269,6 @@  if not ispy3:
 else:
     import urllib.parse
     urlreq._registeraliases(urllib.parse, (
-        "quote",
         "splitattr",
         "splitpasswd",
         "splitport",
@@ -313,3 +312,12 @@  else:
         "SimpleHTTPRequestHandler",
         "CGIHTTPRequestHandler",
     ))
+
+    # urllib.parse.quote() accepts both str and bytes, decodes bytes
+    # (if necessary), and returns str. This is wonky. We provide a custom
+    # implementation that only accepts bytes and emits bytes.
+    def quote(s, safe=r'/'):
+        s = urllib.parse.quote_from_bytes(s, safe=safe)
+        return s.encode('ascii', 'strict')
+
+    urlreq.quote = quote