Patchwork D7054: phabricator: treat non-utf-8 text files as binary as phabricator requires

login
register
mail settings
Submitter phabricator
Date Oct. 10, 2019, 9:53 p.m.
Message ID <differential-rev-PHID-DREV-4tfej3kqfs45k5rbdc2j-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/42210/
State Superseded
Headers show

Comments

phabricator - Oct. 10, 2019, 9:53 p.m.
Kwan created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Phabricator can't cope with text files that are not UTF-8, so requires them to
  be submitted as binary files instead.  This has the unfortunate effect of
  making them practically unreviewable in Phabricator since it will only display
  the separate versions of the file in other views, not a diff.  `phabread`ing
  such submissions are similar, since it will just output the binary patch, but
  `hg import` copes with it fine and `hg diff` afterwards will show the actual
  changes.  It is still a marked improvement over trying to submit them as text,
  which just leads to corruption (Phabricator will either output ? or HTML
  entities for non-UTF-8 characters, depending on context).
  
  Running decode on the whole file like this seems slightly unfortunate, but I'm
  not aware of a better way.
  Needs to be done to p1() version as well to detect conversions to UTF-8.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D7054

AFFECTED FILES
  hgext/phabricator.py

CHANGE DETAILS




To: Kwan, #hg-reviewers
Cc: mercurial-devel

Patch

diff --git a/hgext/phabricator.py b/hgext/phabricator.py
--- a/hgext/phabricator.py
+++ b/hgext/phabricator.py
@@ -695,6 +695,23 @@ 
 gitmode = {b'l': b'120000', b'x': b'100755', b'': b'100644'}
 
 
+def notutf8(fctx):
+    """detect non-UTF-8 text files since Phabricator requires them to be marked
+    as binary
+    """
+    try:
+        fctx.data().decode('utf-8')
+        if fctx.parents():
+            fctx.p1().data().decode('utf-8')
+        return False
+    except UnicodeDecodeError:
+        fctx.repo().ui.write(
+            _(b'file %s detected as non-UTF-8, marked as binary\n')
+            % fctx.path()
+        )
+        return True
+
+
 def addremoved(pdiff, ctx, removed):
     """add removed files to the phabdiff. Shouldn't include moves"""
     for fname in removed:
@@ -703,7 +720,7 @@ 
         )
         pchange.addoldmode(gitmode[ctx.p1()[fname].flags()])
         fctx = ctx.p1()[fname]
-        if not fctx.isbinary():
+        if not (fctx.isbinary() or notutf8(fctx)):
             maketext(pchange, ctx, fname)
 
         pdiff.addchange(pchange)
@@ -720,7 +737,7 @@ 
             pchange.addoldmode(originalmode)
             pchange.addnewmode(filemode)
 
-        if fctx.isbinary():
+        if fctx.isbinary() or notutf8(fctx):
             makebinary(pchange, fctx)
             addoldbinary(pchange, fctx, fname)
         else:
@@ -780,7 +797,7 @@ 
             pchange.addnewmode(gitmode[fctx.flags()])
             pchange.type = DiffChangeType.ADD
 
-        if fctx.isbinary():
+        if fctx.isbinary() or notutf8(fctx):
             makebinary(pchange, fctx)
             if renamed:
                 addoldbinary(pchange, fctx, originalfname)