Patchwork D2472: pycompat: prevent encoding or decoding values if not required

login
register
mail settings
Submitter phabricator
Date Feb. 27, 2018, 9:36 a.m.
Message ID <differential-rev-PHID-DREV-tx2sjqsufghm6b6qq7hf-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/28437/
State Superseded
Headers show

Comments

phabricator - Feb. 27, 2018, 9:36 a.m.
pulkit created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  pycompat.py has functions strurl and bytesurl which decodes and encodes the url
  passed on Python 3 respectively. In some cases, strurl gets a url which is
  already str and bytesurl gets a url which is already bytes. Let's prevent
  encoding or decoding the values again if not required.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2472

AFFECTED FILES
  mercurial/pycompat.py

CHANGE DETAILS




To: pulkit, #hg-reviewers
Cc: mercurial-devel
phabricator - Feb. 27, 2018, 12:51 p.m.
yuja requested changes to this revision.
yuja added a comment.
This revision now requires changes to proceed.


  I'm 0 for this move.
  
  I'm not a fan of making function inputs and outputs obscure because we'll
  be likely to have a hard time to fix it in future. Mixing bytes and unicodes is
  horrible. But if it's painful to be strict on url type, I'll say go for it.

INLINE COMMENTS

> pycompat.py:197
> +            return url.decode(u'ascii')
>  
>      def bytesurl(url):

Need to return an unmodified value.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2472

To: pulkit, #hg-reviewers, yuja
Cc: yuja, mercurial-devel
phabricator - Feb. 27, 2018, 9:50 p.m.
indygreg added a comment.


  I agree with @yuja here. I think we should strive to convert URLs to bytes (like most other internal types). Although I do realize that lots of stdlib code has... expectations. Is there a particular test failure you are tracking down with this fix? I've kind of got a lot of the wire protocol code paged in right now. I could potentially offer some advice.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2472

To: pulkit, #hg-reviewers, yuja
Cc: indygreg, yuja, mercurial-devel
phabricator - March 3, 2018, 5:38 p.m.
durin42 added a comment.


  In https://phab.mercurial-scm.org/D2472#40379, @indygreg wrote:
  
  > I agree with @yuja here. I think we should strive to convert URLs to bytes (like most other internal types). Although I do realize that lots of stdlib code has... expectations. Is there a particular test failure you are tracking down with this fix? I've kind of got a lot of the wire protocol code paged in right now. I could potentially offer some advice.
  
  
  test-http-bundle1.t failures. I think this current patch is probably the "least bad" option (barring getting pytype working, which I think would let us try and constrain some types in more principled ways) since some of these URL-relevant functions are called by both hg and urllib guts. The former wants to use bytes, and the latter can only use unicodes.
  
  I've poked at this on and off a few times and something like this fix is all I've come up with. I'm not happy about it either.
  
  The other option is probably to always treat URLs as unicodes, though that seems like it'll probably come with its own bag of hurt throughout the codebase.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2472

To: pulkit, #hg-reviewers, yuja
Cc: durin42, indygreg, yuja, mercurial-devel

Patch

diff --git a/mercurial/pycompat.py b/mercurial/pycompat.py
--- a/mercurial/pycompat.py
+++ b/mercurial/pycompat.py
@@ -192,11 +192,13 @@ 
 
     def strurl(url):
         """Converts a bytes url back to str"""
-        return url.decode(u'ascii')
+        if isinstance(url, bytes):
+            return url.decode(u'ascii')
 
     def bytesurl(url):
         """Converts a str url to bytes by encoding in ascii"""
-        return url.encode(u'ascii')
+        if isinstance(url, str):
+            return url.encode(u'ascii')
 
     def raisewithtb(exc, tb):
         """Raise exception with the given traceback"""