Patchwork D7061: convert: don't pass bytes to, or expect bytes from, emailparser

login
register
mail settings
Submitter phabricator
Date Oct. 11, 2019, 5:12 p.m.
Message ID <differential-rev-PHID-DREV-zse4n7extk3zuboibbsj-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/42220/
State Superseded
Headers show

Comments

phabricator - Oct. 11, 2019, 5:12 p.m.
Kwan created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D7061

AFFECTED FILES
  hgext/convert/gnuarch.py

CHANGE DETAILS




To: Kwan, #hg-reviewers
Cc: mercurial-devel
phabricator - Oct. 11, 2019, 6:23 p.m.
martinvonz added inline comments.

INLINE COMMENTS

> gnuarch.py:301
>          try:
> -            catlog = self.catlogparser.parsestr(data)
> +            catlog = self.catlogparser.parsestr(encoding.unifromlocal(data))
>  

Is it right to depend on the user's preferred encoding (as I think `unifromlocal()` does)? Would it make sense to instead initialize `self.catlogparser = emailparser.BytesParser()`?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7061/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7061

To: Kwan, #hg-reviewers
Cc: martinvonz, mercurial-devel
phabricator - Oct. 12, 2019, 2:36 p.m.
Kwan added inline comments.

INLINE COMMENTS

> martinvonz wrote in gnuarch.py:301
> Is it right to depend on the user's preferred encoding (as I think `unifromlocal()` does)? Would it make sense to instead initialize `self.catlogparser = emailparser.BytesParser()`?

Hmm, I wasn't aware of that drawback of `unifromlocal()` if it's true.  Is there a canonical "Give me unicode from these mercurial bytes" function?  Regardless, `BytesParser` does sound handy, and is even in 3.5 <https://docs.python.org/3.5/library/email.parser.html#email.parser.BytesParser>, but isn't present in 2.7.  Would doing it conditionally be alright?  (and a conditional alias for parsebytes)

  self.catlogparser = (
      emailparser.BytesParser()
      if pycompat.ispy3
      else emailparser.Parser()
  )
  if not pycompat.ispy3:
      self.catlogparser.parsebytes = self.catlogparser.parsestr

  -            catlog = self.catlogparser.parsestr(data)
  +            catlog = self.catlogparser.parsebytes(data)

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7061/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7061

To: Kwan, #hg-reviewers
Cc: martinvonz, mercurial-devel
phabricator - Oct. 16, 2019, 8:43 p.m.
durin42 added inline comments.

INLINE COMMENTS

> Kwan wrote in gnuarch.py:301
> Hmm, I wasn't aware of that drawback of `unifromlocal()` if it's true.  Is there a canonical "Give me unicode from these mercurial bytes" function?  Regardless, `BytesParser` does sound handy, and is even in 3.5 <https://docs.python.org/3.5/library/email.parser.html#email.parser.BytesParser>, but isn't present in 2.7.  Would doing it conditionally be alright?  (and a conditional alias for parsebytes)
> 
>   self.catlogparser = (
>       emailparser.BytesParser()
>       if pycompat.ispy3
>       else emailparser.Parser()
>   )
>   if not pycompat.ispy3:
>       self.catlogparser.parsebytes = self.catlogparser.parsestr
> 
>   -            catlog = self.catlogparser.parsestr(data)
>   +            catlog = self.catlogparser.parsebytes(data)

It depends which bytes, basically. Some bytes in hg are known to be UTF-8, but anytime we have file contents or filenames we don't know.

I'm not sure of the context here, but maybe the output from tla is in some known encoding?

(I'm also open to the idea of dropping tla convert support as we move to Python 3, as tla has been obsolete for a _long_ time.)

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7061/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7061

To: Kwan, #hg-reviewers
Cc: durin42, martinvonz, mercurial-devel
phabricator - Nov. 4, 2019, 7:28 p.m.
Kwan added a comment.
Kwan abandoned this revision.


  Obsoleted by a better fix in cf3bf3b03445 <https://phab.mercurial-scm.org/rHGcf3bf3b03445b79997905a4aafb6c4ae14cf7c3f>.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7061/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7061

To: Kwan, #hg-reviewers
Cc: durin42, martinvonz, mercurial-devel

Patch

diff --git a/hgext/convert/gnuarch.py b/hgext/convert/gnuarch.py
--- a/hgext/convert/gnuarch.py
+++ b/hgext/convert/gnuarch.py
@@ -298,26 +298,26 @@ 
 
     def _parsecatlog(self, data, rev):
         try:
-            catlog = self.catlogparser.parsestr(data)
+            catlog = self.catlogparser.parsestr(encoding.unifromlocal(data))
 
             # Commit date
             self.changes[rev].date = dateutil.datestr(
-                dateutil.strdate(catlog[b'Standard-date'], b'%Y-%m-%d %H:%M:%S')
+                dateutil.strdate(catlog['Standard-date'], b'%Y-%m-%d %H:%M:%S')
             )
 
             # Commit author
-            self.changes[rev].author = self.recode(catlog[b'Creator'])
+            self.changes[rev].author = self.recode(catlog['Creator'])
 
             # Commit description
-            self.changes[rev].summary = b'\n\n'.join(
-                (catlog[b'Summary'], catlog.get_payload())
+            self.changes[rev].summary = '\n\n'.join(
+                (catlog['Summary'], catlog.get_payload())
             )
             self.changes[rev].summary = self.recode(self.changes[rev].summary)
 
             # Commit revision origin when dealing with a branch or tag
-            if b'Continuation-of' in catlog:
+            if 'Continuation-of' in catlog:
                 self.changes[rev].continuationof = self.recode(
-                    catlog[b'Continuation-of']
+                    catlog['Continuation-of']
                 )
         except Exception:
             raise error.Abort(_(b'could not parse cat-log of %s') % rev)