Submitter | Gregory Szorc |
---|---|
Date | March 13, 2017, 7:27 p.m. |
Message ID | <e6b177a3a7662366e8af.1489433243@ubuntu-vm-main> |
Download | mbox | patch |
Permalink | /patch/19299/ |
State | Accepted |
Headers | show |
Comments
On Mon, Mar 13, 2017 at 12:27:23PM -0700, Gregory Szorc wrote: > # HG changeset patch > # User Gregory Szorc <gregory.szorc@gmail.com> > # Date 1489432607 25200 > # Mon Mar 13 12:16:47 2017 -0700 > # Node ID e6b177a3a7662366e8af0ba07ccbd9509a8cc647 > # Parent 2b9547ebdfa84c3e96fd366e3c09dd24306747d4 > pycompat: custom implementation of urllib.parse.quote() Queued these, thanks. > > urllib.parse.quote() accepts either str or bytes and returns str. > > There exists a urllib.parse.quote_from_bytes() which only accepts > bytes. We should probably use that to retain strong typing and > avoid surprises. > > In addition, since nearly all strings in Mercurial are bytes, we > probably don't want quote() returning unicode. > > So, this patch implements a custom quote() that only accepts bytes > and returns bytes. The quoted URL should only contain URL safe > characters which is a strict subset of ASCII. So > `.encode('ascii', 'strict')` should be safe. > > After this patch, `hg init <path>` works on Python 3.5! > > diff --git a/mercurial/pycompat.py b/mercurial/pycompat.py > --- a/mercurial/pycompat.py > +++ b/mercurial/pycompat.py > @@ -269,7 +269,6 @@ if not ispy3: > else: > import urllib.parse > urlreq._registeraliases(urllib.parse, ( > - "quote", > "splitattr", > "splitpasswd", > "splitport", > @@ -313,3 +312,12 @@ else: > "SimpleHTTPRequestHandler", > "CGIHTTPRequestHandler", > )) > + > + # urllib.parse.quote() accepts both str and bytes, decodes bytes > + # (if necessary), and returns str. This is wonky. We provide a custom > + # implementation that only accepts bytes and emits bytes. > + def quote(s, safe=r'/'): > + s = urllib.parse.quote_from_bytes(s, safe=safe) > + return s.encode('ascii', 'strict') > + > + urlreq.quote = quote > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
On Mon, Mar 13, 2017 at 12:27:23PM -0700, Gregory Szorc wrote: > # HG changeset patch > # User Gregory Szorc <gregory.szorc@gmail.com> > # Date 1489432607 25200 > # Mon Mar 13 12:16:47 2017 -0700 > # Node ID e6b177a3a7662366e8af0ba07ccbd9509a8cc647 > # Parent 2b9547ebdfa84c3e96fd366e3c09dd24306747d4 > pycompat: custom implementation of urllib.parse.quote() Bah, these no longer apply. Can you rebase and resend? (cc me for faster turnaround so they don't bitrot again.) > > urllib.parse.quote() accepts either str or bytes and returns str. > > There exists a urllib.parse.quote_from_bytes() which only accepts > bytes. We should probably use that to retain strong typing and > avoid surprises. > > In addition, since nearly all strings in Mercurial are bytes, we > probably don't want quote() returning unicode. > > So, this patch implements a custom quote() that only accepts bytes > and returns bytes. The quoted URL should only contain URL safe > characters which is a strict subset of ASCII. So > `.encode('ascii', 'strict')` should be safe. > > After this patch, `hg init <path>` works on Python 3.5! > > diff --git a/mercurial/pycompat.py b/mercurial/pycompat.py > --- a/mercurial/pycompat.py > +++ b/mercurial/pycompat.py > @@ -269,7 +269,6 @@ if not ispy3: > else: > import urllib.parse > urlreq._registeraliases(urllib.parse, ( > - "quote", > "splitattr", > "splitpasswd", > "splitport", > @@ -313,3 +312,12 @@ else: > "SimpleHTTPRequestHandler", > "CGIHTTPRequestHandler", > )) > + > + # urllib.parse.quote() accepts both str and bytes, decodes bytes > + # (if necessary), and returns str. This is wonky. We provide a custom > + # implementation that only accepts bytes and emits bytes. > + def quote(s, safe=r'/'): > + s = urllib.parse.quote_from_bytes(s, safe=safe) > + return s.encode('ascii', 'strict') > + > + urlreq.quote = quote > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
On Tue, Mar 21, 2017 at 2:44 PM, Augie Fackler <raf@durin42.com> wrote: > On Mon, Mar 13, 2017 at 12:27:23PM -0700, Gregory Szorc wrote: > > # HG changeset patch > > # User Gregory Szorc <gregory.szorc@gmail.com> > > # Date 1489432607 25200 > > # Mon Mar 13 12:16:47 2017 -0700 > > # Node ID e6b177a3a7662366e8af0ba07ccbd9509a8cc647 > > # Parent 2b9547ebdfa84c3e96fd366e3c09dd24306747d4 > > pycompat: custom implementation of urllib.parse.quote() > > Bah, these no longer apply. Can you rebase and resend? (cc me for > faster turnaround so they don't bitrot again.) > I'm pretty sure that's because they are already published as 1ed169c5e235 and fb1f70331ee6. > > > > > urllib.parse.quote() accepts either str or bytes and returns str. > > > > There exists a urllib.parse.quote_from_bytes() which only accepts > > bytes. We should probably use that to retain strong typing and > > avoid surprises. > > > > In addition, since nearly all strings in Mercurial are bytes, we > > probably don't want quote() returning unicode. > > > > So, this patch implements a custom quote() that only accepts bytes > > and returns bytes. The quoted URL should only contain URL safe > > characters which is a strict subset of ASCII. So > > `.encode('ascii', 'strict')` should be safe. > > > > After this patch, `hg init <path>` works on Python 3.5! > > > > diff --git a/mercurial/pycompat.py b/mercurial/pycompat.py > > --- a/mercurial/pycompat.py > > +++ b/mercurial/pycompat.py > > @@ -269,7 +269,6 @@ if not ispy3: > > else: > > import urllib.parse > > urlreq._registeraliases(urllib.parse, ( > > - "quote", > > "splitattr", > > "splitpasswd", > > "splitport", > > @@ -313,3 +312,12 @@ else: > > "SimpleHTTPRequestHandler", > > "CGIHTTPRequestHandler", > > )) > > + > > + # urllib.parse.quote() accepts both str and bytes, decodes bytes > > + # (if necessary), and returns str. This is wonky. We provide a > custom > > + # implementation that only accepts bytes and emits bytes. > > + def quote(s, safe=r'/'): > > + s = urllib.parse.quote_from_bytes(s, safe=safe) > > + return s.encode('ascii', 'strict') > > + > > + urlreq.quote = quote > > _______________________________________________ > > Mercurial-devel mailing list > > Mercurial-devel@mercurial-scm.org > > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel >
On Mar 22, 2017 12:52 AM, "Gregory Szorc" <gregory.szorc@gmail.com> wrote: On Tue, Mar 21, 2017 at 2:44 PM, Augie Fackler <raf@durin42.com> wrote: > On Mon, Mar 13, 2017 at 12:27:23PM -0700, Gregory Szorc wrote: > > # HG changeset patch > > # User Gregory Szorc <gregory.szorc@gmail.com> > > # Date 1489432607 25200 > > # Mon Mar 13 12:16:47 2017 -0700 > > # Node ID e6b177a3a7662366e8af0ba07ccbd9509a8cc647 > > # Parent 2b9547ebdfa84c3e96fd366e3c09dd24306747d4 > > pycompat: custom implementation of urllib.parse.quote() > > Bah, these no longer apply. Can you rebase and resend? (cc me for > faster turnaround so they don't bitrot again.) > I'm pretty sure that's because they are already published as 1ed169c5e235 and fb1f70331ee6. Well that'd do it. Sigh. > > > > urllib.parse.quote() accepts either str or bytes and returns str. > > > > There exists a urllib.parse.quote_from_bytes() which only accepts > > bytes. We should probably use that to retain strong typing and > > avoid surprises. > > > > In addition, since nearly all strings in Mercurial are bytes, we > > probably don't want quote() returning unicode. > > > > So, this patch implements a custom quote() that only accepts bytes > > and returns bytes. The quoted URL should only contain URL safe > > characters which is a strict subset of ASCII. So > > `.encode('ascii', 'strict')` should be safe. > > > > After this patch, `hg init <path>` works on Python 3.5! > > > > diff --git a/mercurial/pycompat.py b/mercurial/pycompat.py > > --- a/mercurial/pycompat.py > > +++ b/mercurial/pycompat.py > > @@ -269,7 +269,6 @@ if not ispy3: > > else: > > import urllib.parse > > urlreq._registeraliases(urllib.parse, ( > > - "quote", > > "splitattr", > > "splitpasswd", > > "splitport", > > @@ -313,3 +312,12 @@ else: > > "SimpleHTTPRequestHandler", > > "CGIHTTPRequestHandler", > > )) > > + > > + # urllib.parse.quote() accepts both str and bytes, decodes bytes > > + # (if necessary), and returns str. This is wonky. We provide a > custom > > + # implementation that only accepts bytes and emits bytes. > > + def quote(s, safe=r'/'): > > + s = urllib.parse.quote_from_bytes(s, safe=safe) > > + return s.encode('ascii', 'strict') > > + > > + urlreq.quote = quote > > _______________________________________________ > > Mercurial-devel mailing list > > Mercurial-devel@mercurial-scm.org > > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel >
Patch
diff --git a/mercurial/pycompat.py b/mercurial/pycompat.py --- a/mercurial/pycompat.py +++ b/mercurial/pycompat.py @@ -269,7 +269,6 @@ if not ispy3: else: import urllib.parse urlreq._registeraliases(urllib.parse, ( - "quote", "splitattr", "splitpasswd", "splitport", @@ -313,3 +312,12 @@ else: "SimpleHTTPRequestHandler", "CGIHTTPRequestHandler", )) + + # urllib.parse.quote() accepts both str and bytes, decodes bytes + # (if necessary), and returns str. This is wonky. We provide a custom + # implementation that only accepts bytes and emits bytes. + def quote(s, safe=r'/'): + s = urllib.parse.quote_from_bytes(s, safe=safe) + return s.encode('ascii', 'strict') + + urlreq.quote = quote