Patchwork [1,of,2] rust: extract function to convert Path to platform CString

login
register
mail settings
Submitter Yuya Nishihara
Date Jan. 12, 2018, 2:22 p.m.
Message ID <44289d88954aaaa2a3c5.1515766928@mimosa>
Download mbox | patch
Permalink /patch/26703/
State Accepted
Headers show

Comments

Yuya Nishihara - Jan. 12, 2018, 2:22 p.m.
# HG changeset patch
# User Yuya Nishihara <yuya@tcha.org>
# Date 1515762574 -32400
#      Fri Jan 12 22:09:34 2018 +0900
# Node ID 44289d88954aaaa2a3c559c424fa1f2d85cb7e16
# Parent  ea9bd35529f231c438630071119a309ba84dcc77
rust: extract function to convert Path to platform CString

It can be better on Unix.
Gregory Szorc - Jan. 14, 2018, 7:57 p.m.
On Fri, Jan 12, 2018 at 6:22 AM, Yuya Nishihara <yuya@tcha.org> wrote:

> # HG changeset patch
> # User Yuya Nishihara <yuya@tcha.org>
> # Date 1515762574 -32400
> #      Fri Jan 12 22:09:34 2018 +0900
> # Node ID 44289d88954aaaa2a3c559c424fa1f2d85cb7e16
> # Parent  ea9bd35529f231c438630071119a309ba84dcc77
> rust: extract function to convert Path to platform CString
>

Queued this series, thanks.

FWIW, I was thinking that on Windows we could pass the raw/native Windows
args array into a Python list on the mercurial module or something. This
would bypass the CPython APIs to set argv. If we go through the CPython
APIs and use sys.argv, we need to normalize to char*. The advantage here is
Mercurial would be able to access the raw byte sequence and apply an
appropriate encoding. In other words, bypassing the restrictive Python 2.7
C API could result in less data loss.


>
> It can be better on Unix.
>
> diff --git a/rust/hgcli/src/main.rs b/rust/hgcli/src/main.rs
> --- a/rust/hgcli/src/main.rs
> +++ b/rust/hgcli/src/main.rs
> @@ -14,7 +14,7 @@ use libc::{c_char, c_int};
>
>  use std::env;
>  use std::path::PathBuf;
> -use std::ffi::CString;
> +use std::ffi::{CString, OsStr};
>  #[cfg(target_family = "unix")]
>  use std::os::unix::ffi::OsStringExt;
>
> @@ -62,6 +62,10 @@ fn get_environment() -> Environment {
>      }
>  }
>
> +fn cstring_from_os<T: AsRef<OsStr>>(s: T) -> CString {
> +    CString::new(s.as_ref().to_str().unwrap()).unwrap()
> +}
> +
>  // On UNIX, argv starts as an array of char*. So it is easy to convert
>  // to C strings.
>  #[cfg(target_family = "unix")]
> @@ -86,9 +90,7 @@ fn args_to_cstrings() -> Vec<CString> {
>  }
>
>  fn set_python_home(env: &Environment) {
> -    let raw = CString::new(env.python_home.to_str().unwrap())
> -        .unwrap()
> -        .into_raw();
> +    let raw = cstring_from_os(&env.python_home).into_raw();
>      unsafe {
>          python27_sys::Py_SetPythonHome(raw);
>      }
> @@ -133,9 +135,7 @@ fn run() -> Result<(), i32> {
>      // Python files. Apparently we could define our own ``Py_GetPath()``
>      // implementation. But this may require statically linking Python,
> which is
>      // not desirable.
> -    let program_name = CString::new(env.python_exe.to_str().unwrap())
> -        .unwrap()
> -        .as_ptr();
> +    let program_name = cstring_from_os(&env.python_exe).as_ptr();
>      unsafe {
>          python27_sys::Py_SetProgramName(program_name as *mut i8);
>      }
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
Yuya Nishihara - Jan. 15, 2018, 1:24 p.m.
On Sun, 14 Jan 2018 11:57:09 -0800, Gregory Szorc wrote:
> FWIW, I was thinking that on Windows we could pass the raw/native Windows
> args array into a Python list on the mercurial module or something. This
> would bypass the CPython APIs to set argv. If we go through the CPython
> APIs and use sys.argv, we need to normalize to char*. The advantage here is
> Mercurial would be able to access the raw byte sequence and apply an
> appropriate encoding. In other words, bypassing the restrictive Python 2.7
> C API could result in less data loss.

I'm skeptical if Python codecs functions can loose unsupported characters in
the same way as Windows ANSI API. Some impl details might be important for
strict compatibility, but I'm not sure.

Patch

diff --git a/rust/hgcli/src/main.rs b/rust/hgcli/src/main.rs
--- a/rust/hgcli/src/main.rs
+++ b/rust/hgcli/src/main.rs
@@ -14,7 +14,7 @@  use libc::{c_char, c_int};
 
 use std::env;
 use std::path::PathBuf;
-use std::ffi::CString;
+use std::ffi::{CString, OsStr};
 #[cfg(target_family = "unix")]
 use std::os::unix::ffi::OsStringExt;
 
@@ -62,6 +62,10 @@  fn get_environment() -> Environment {
     }
 }
 
+fn cstring_from_os<T: AsRef<OsStr>>(s: T) -> CString {
+    CString::new(s.as_ref().to_str().unwrap()).unwrap()
+}
+
 // On UNIX, argv starts as an array of char*. So it is easy to convert
 // to C strings.
 #[cfg(target_family = "unix")]
@@ -86,9 +90,7 @@  fn args_to_cstrings() -> Vec<CString> {
 }
 
 fn set_python_home(env: &Environment) {
-    let raw = CString::new(env.python_home.to_str().unwrap())
-        .unwrap()
-        .into_raw();
+    let raw = cstring_from_os(&env.python_home).into_raw();
     unsafe {
         python27_sys::Py_SetPythonHome(raw);
     }
@@ -133,9 +135,7 @@  fn run() -> Result<(), i32> {
     // Python files. Apparently we could define our own ``Py_GetPath()``
     // implementation. But this may require statically linking Python, which is
     // not desirable.
-    let program_name = CString::new(env.python_exe.to_str().unwrap())
-        .unwrap()
-        .as_ptr();
+    let program_name = cstring_from_os(&env.python_exe).as_ptr();
     unsafe {
         python27_sys::Py_SetProgramName(program_name as *mut i8);
     }