Patchwork [STABLE,BC,V4] convert: when converting from Perforce use original local encoding by default

login
register
mail settings
Submitter Eugene Baranov
Date July 28, 2015, 5:44 p.m.
Message ID <38562df55db33073e970.1438105455@ADNADTX6400256.eng.citrite.net>
Download mbox | patch
Permalink /patch/10069/
State Accepted
Headers show

Comments

Eugene Baranov - July 28, 2015, 5:44 p.m.
# HG changeset patch
# User Eugene Baranov <eug.baranov@gmail.com>
# Date 1437580631 -3600
#      Wed Jul 22 16:57:11 2015 +0100
# Branch stable
# Node ID 38562df55db33073e9709efb1f567ee877050dcc
# Parent  3e84f40232c7931dbeca16e563ae8f63ca12cf4d
convert: when converting from Perforce use original local encoding by default

On Windows Perforce command line client uses default system locale to encode
output. Using 'latin_1' causes locale-specific characters to be replaced with
question marks. With this patch we will use default locale by default whilst
allowing to specify it explicity with 'convert.p4.encoding' config option.

This is a potentially breaking change for any scripts relying on output treated
as in 'latin_1' encoding.

Also because hgext.convert.convcmd overwrites detected default system locale
with UTF-8 we had to introduce an import cycle in hgext.convert.p4 to retrieve
originally detected encoding from hgext.convert.convcmd.
Eugene Baranov - July 30, 2015, 6:13 p.m.
This should pass the tests (although it was painful to find which are
failing using Windows).

On 28 July 2015 at 18:44, Eugene Baranov <eug.baranov@gmail.com> wrote:
> # HG changeset patch
> # User Eugene Baranov <eug.baranov@gmail.com>
> # Date 1437580631 -3600
> #      Wed Jul 22 16:57:11 2015 +0100
> # Branch stable
> # Node ID 38562df55db33073e9709efb1f567ee877050dcc
> # Parent  3e84f40232c7931dbeca16e563ae8f63ca12cf4d
> convert: when converting from Perforce use original local encoding by default
>
> On Windows Perforce command line client uses default system locale to encode
> output. Using 'latin_1' causes locale-specific characters to be replaced with
> question marks. With this patch we will use default locale by default whilst
> allowing to specify it explicity with 'convert.p4.encoding' config option.
>
> This is a potentially breaking change for any scripts relying on output treated
> as in 'latin_1' encoding.
>
> Also because hgext.convert.convcmd overwrites detected default system locale
> with UTF-8 we had to introduce an import cycle in hgext.convert.p4 to retrieve
> originally detected encoding from hgext.convert.convcmd.
>
> diff -r 3e84f40232c7 -r 38562df55db3 hgext/convert/__init__.py
> --- a/hgext/convert/__init__.py Sun Jul 26 09:28:52 2015 -0300
> +++ b/hgext/convert/__init__.py Wed Jul 22 16:57:11 2015 +0100
> @@ -323,8 +323,11 @@
>      usually should specify a target directory, because otherwise the
>      target may be named ``...-hg``.
>
> -    It is possible to limit the amount of source history to be
> -    converted by specifying an initial Perforce revision:
> +    The following options can be set with ``--config``:
> +
> +    :convert.p4.encoding: specify the encoding to use when decoding standard
> +        output of the Perforce command line tool. The default is default system
> +        encoding.
>
>      :convert.p4.startrev: specify initial Perforce revision (a
>          Perforce changelist number).
> diff -r 3e84f40232c7 -r 38562df55db3 hgext/convert/p4.py
> --- a/hgext/convert/p4.py       Sun Jul 26 09:28:52 2015 -0300
> +++ b/hgext/convert/p4.py       Wed Jul 22 16:57:11 2015 +0100
> @@ -9,6 +9,7 @@
>  from mercurial.i18n import _
>
>  from common import commit, converter_source, checktool, NoRepo
> +import convcmd
>  import marshal
>  import re
>
> @@ -54,7 +55,8 @@
>          self.tags = {}
>          self.lastbranch = {}
>          self.parent = {}
> -        self.encoding = "latin_1"
> +        self.encoding = self.ui.config('convert', 'p4.encoding',
> +                                       default=convcmd.orig_encoding)
>          self.depotname = {}           # mapping from local name to depot name
>          self.localname = {} # mapping from depot name to local name
>          self.re_type = re.compile(
> diff -r 3e84f40232c7 -r 38562df55db3 tests/test-convert.t
> --- a/tests/test-convert.t      Sun Jul 26 09:28:52 2015 -0300
> +++ b/tests/test-convert.t      Wed Jul 22 16:57:11 2015 +0100
> @@ -273,9 +273,12 @@
>        that when a depot path is given you then usually should specify a target
>        directory, because otherwise the target may be named "...-hg".
>
> -      It is possible to limit the amount of source history to be converted by
> -      specifying an initial Perforce revision:
> +      The following options can be set with "--config":
>
> +      convert.p4.encoding
> +                    specify the encoding to use when decoding standard output of
> +                    the Perforce command line tool. The default is default
> +                    system encoding.
>        convert.p4.startrev
>                      specify initial Perforce revision (a Perforce changelist
>                      number).
> diff -r 3e84f40232c7 -r 38562df55db3 tests/test-module-imports.t
> --- a/tests/test-module-imports.t       Sun Jul 26 09:28:52 2015 -0300
> +++ b/tests/test-module-imports.t       Wed Jul 22 16:57:11 2015 +0100
> @@ -127,6 +127,7 @@
>    mercurial/ui.py mixed imports
>       stdlib:    formatter
>       relative:  config, error, progress, scmutil, util
> +  Import cycle: hgext.convert.convcmd -> hgext.convert.p4 -> hgext.convert.convcmd
>    Import cycle: mercurial.cmdutil -> mercurial.context -> mercurial.subrepo -> mercurial.cmdutil
>    Import cycle: hgext.largefiles.basestore -> hgext.largefiles.localstore -> hgext.largefiles.basestore
>    Import cycle: mercurial.commands -> mercurial.commandserver -> mercurial.dispatch -> mercurial.commands
Matt Mackall - July 30, 2015, 11:08 p.m.
On Tue, 2015-07-28 at 18:44 +0100, Eugene Baranov wrote:
> # HG changeset patch
> # User Eugene Baranov <eug.baranov@gmail.com>
> # Date 1437580631 -3600
> #      Wed Jul 22 16:57:11 2015 +0100
> # Branch stable
> # Node ID 38562df55db33073e9709efb1f567ee877050dcc
> # Parent  3e84f40232c7931dbeca16e563ae8f63ca12cf4d
> convert: when converting from Perforce use original local encoding by default
> 
> On Windows Perforce command line client uses default system locale to encode
> output. Using 'latin_1' causes locale-specific characters to be replaced with
> question marks. With this patch we will use default locale by default whilst
> allowing to specify it explicity with 'convert.p4.encoding' config option.
> 
> This is a potentially breaking change for any scripts relying on output treated
> as in 'latin_1' encoding.
> 
> Also because hgext.convert.convcmd overwrites detected default system locale
> with UTF-8 we had to introduce an import cycle in hgext.convert.p4 to retrieve
> originally detected encoding from hgext.convert.convcmd.

I've modified this to avoid the import cycle issue by delaying the
cyclic import to __init__. Queued for stable, thanks.

Patch

diff -r 3e84f40232c7 -r 38562df55db3 hgext/convert/__init__.py
--- a/hgext/convert/__init__.py	Sun Jul 26 09:28:52 2015 -0300
+++ b/hgext/convert/__init__.py	Wed Jul 22 16:57:11 2015 +0100
@@ -323,8 +323,11 @@ 
     usually should specify a target directory, because otherwise the
     target may be named ``...-hg``.
 
-    It is possible to limit the amount of source history to be
-    converted by specifying an initial Perforce revision:
+    The following options can be set with ``--config``:
+
+    :convert.p4.encoding: specify the encoding to use when decoding standard
+        output of the Perforce command line tool. The default is default system
+        encoding.
 
     :convert.p4.startrev: specify initial Perforce revision (a
         Perforce changelist number).
diff -r 3e84f40232c7 -r 38562df55db3 hgext/convert/p4.py
--- a/hgext/convert/p4.py	Sun Jul 26 09:28:52 2015 -0300
+++ b/hgext/convert/p4.py	Wed Jul 22 16:57:11 2015 +0100
@@ -9,6 +9,7 @@ 
 from mercurial.i18n import _
 
 from common import commit, converter_source, checktool, NoRepo
+import convcmd
 import marshal
 import re
 
@@ -54,7 +55,8 @@ 
         self.tags = {}
         self.lastbranch = {}
         self.parent = {}
-        self.encoding = "latin_1"
+        self.encoding = self.ui.config('convert', 'p4.encoding',
+                                       default=convcmd.orig_encoding)
         self.depotname = {}           # mapping from local name to depot name
         self.localname = {} # mapping from depot name to local name
         self.re_type = re.compile(
diff -r 3e84f40232c7 -r 38562df55db3 tests/test-convert.t
--- a/tests/test-convert.t	Sun Jul 26 09:28:52 2015 -0300
+++ b/tests/test-convert.t	Wed Jul 22 16:57:11 2015 +0100
@@ -273,9 +273,12 @@ 
       that when a depot path is given you then usually should specify a target
       directory, because otherwise the target may be named "...-hg".
   
-      It is possible to limit the amount of source history to be converted by
-      specifying an initial Perforce revision:
+      The following options can be set with "--config":
   
+      convert.p4.encoding
+                    specify the encoding to use when decoding standard output of
+                    the Perforce command line tool. The default is default
+                    system encoding.
       convert.p4.startrev
                     specify initial Perforce revision (a Perforce changelist
                     number).
diff -r 3e84f40232c7 -r 38562df55db3 tests/test-module-imports.t
--- a/tests/test-module-imports.t	Sun Jul 26 09:28:52 2015 -0300
+++ b/tests/test-module-imports.t	Wed Jul 22 16:57:11 2015 +0100
@@ -127,6 +127,7 @@ 
   mercurial/ui.py mixed imports
      stdlib:    formatter
      relative:  config, error, progress, scmutil, util
+  Import cycle: hgext.convert.convcmd -> hgext.convert.p4 -> hgext.convert.convcmd
   Import cycle: mercurial.cmdutil -> mercurial.context -> mercurial.subrepo -> mercurial.cmdutil
   Import cycle: hgext.largefiles.basestore -> hgext.largefiles.localstore -> hgext.largefiles.basestore
   Import cycle: mercurial.commands -> mercurial.commandserver -> mercurial.dispatch -> mercurial.commands