Patchwork [4,of,6,stable] convert: convert URLs to UTF-8 for Subversion

login
register
mail settings
Submitter Manuel Jacob
Date June 30, 2020, 6:45 a.m.
Message ID <388217b58dc5aa2e4b34.1593499545@tmp>
Download mbox | patch
Permalink /patch/46597/
State Accepted
Headers show

Comments

Manuel Jacob - June 30, 2020, 6:45 a.m.
# HG changeset patch
# User Manuel Jacob <me@manueljacob.de>
# Date 1593487847 -7200
#      Tue Jun 30 05:30:47 2020 +0200
# Branch stable
# Node ID 388217b58dc5aa2e4b34b8fab67c17016abcc005
# Parent  7bd48930ea77337213859f562e2fc0abd6734830
# EXP-Topic svn_encoding
convert: convert URLs to UTF-8 for Subversion

Preamble: for comprehension, note that the `path` of geturl() would better be
called `path_or_url` (the argument of the call of getsvn() is called `url`).

For HTTP(S) URLs, the changes don’t make a difference, as they are restricted to
ASCII.

For file URLs, the reasoning is the same as for paths: we have to roundtrip with
what Subversion is doing.

When the locale encoding is ISO-8859-15, trying to convert a SVN repo
`file:///tmp/a€` failed before like this:

file:///tmp/a%A4 does not look like a Subversion repository to libsvn version 1.14.0

Decoding the path using the locale encoding can fail. In this case, we have to
bail out, as Subversion won’t be able to do anything useful with the path.

Patch

diff --git a/hgext/convert/subversion.py b/hgext/convert/subversion.py
--- a/hgext/convert/subversion.py
+++ b/hgext/convert/subversion.py
@@ -65,10 +65,10 @@ 
     svn = None
 
 
-# In Subversion, paths are Unicode (encoded as UTF-8), which Subversion
-# converts from / to native strings when interfacing with the OS. When passing
-# paths to Subversion, we have to recode them such that it roundstrips with
-# what Subversion is doing.
+# In Subversion, paths and URLs are Unicode (encoded as UTF-8), which
+# Subversion converts from / to native strings when interfacing with the OS.
+# When passing paths and URLs to Subversion, we have to recode them such that
+# it roundstrips with what Subversion is doing.
 
 fsencoding = None
 
@@ -141,7 +141,9 @@ 
 
 def geturl(path):
     try:
-        return svn.client.url_from_path(svn.core.svn_path_canonicalize(path))
+        return svn.client.url_from_path(
+            svn.core.svn_path_canonicalize(fs2svn(path))
+        )
     except svn.core.SubversionException:
         # svn.client.url_from_path() fails with local repositories
         pass
@@ -358,6 +360,19 @@ 
                 and path[2:6].lower() == b'%3a/'
             ):
                 path = path[:2] + b':/' + path[6:]
+            try:
+                path.decode(fsencoding)
+            except UnicodeDecodeError:
+                ui.warn(
+                    _(
+                        b'Subversion requires that file URLs can be converted '
+                        b'to Unicode using the current locale encoding (%s)\n'
+                    )
+                    % pycompat.sysbytes(fsencoding)
+                )
+                return False
+            # FIXME: The following reasoning and logic is wrong and will be
+            # fixed in a following changeset.
             # pycompat.fsdecode() / pycompat.fsencode() are used so that bytes
             # in the URL roundtrip correctly on Unix. urlreq.url2pathname() on
             # py3 will decode percent-encoded bytes using the utf-8 encoding
diff --git a/tests/test-convert-svn-encoding.t b/tests/test-convert-svn-encoding.t
--- a/tests/test-convert-svn-encoding.t
+++ b/tests/test-convert-svn-encoding.t
@@ -182,6 +182,20 @@ 
   cannot find required "p4" tool
   abort: \xff: missing or unsupported repository (glob) (esc)
   [255]
+  $ hg convert file://$TESTTMP/$XFF test
+  initializing destination test repository
+  Subversion requires that file URLs can be converted to Unicode using the current locale encoding (ascii)
+  file:/*/$TESTTMP/\xff does not look like a CVS checkout (glob) (esc)
+  $TESTTMP/file:$TESTTMP/\xff does not look like a Git repository (esc)
+  file:/*/$TESTTMP/\xff does not look like a Subversion repository (glob) (esc)
+  file:/*/$TESTTMP/\xff is not a local Mercurial repository (glob) (esc)
+  file:/*/$TESTTMP/\xff does not look like a darcs repository (glob) (esc)
+  file:/*/$TESTTMP/\xff does not look like a monotone repository (glob) (esc)
+  file:/*/$TESTTMP/\xff does not look like a GNU Arch repository (glob) (esc)
+  file:/*/$TESTTMP/\xff does not look like a Bazaar repository (glob) (esc)
+  file:/*/$TESTTMP/\xff does not look like a P4 repository (glob) (esc)
+  abort: file:/*/$TESTTMP/\xff: missing or unsupported repository (glob) (esc)
+  [255]
 
 #if py3
 For now, on Python 3, we abort when encountering non-UTF-8 percent-encoded