Patchwork D1581: [RFC] rust: Rust implementation of `hg` and standalone packaging

login
register
mail settings
Submitter phabricator
Date Dec. 4, 2017, 7 a.m.
Message ID <differential-rev-PHID-DREV-vkui65ingyprrf5pxdsp-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/25912/
State Superseded
Headers show

Comments

phabricator - Dec. 4, 2017, 7 a.m.
indygreg created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  This commit demonstrates the feasibility of integrating Rust into
  Mercurial.
  
  There are two significant components of this commit:
  
  1. Establish a Rust `hg` binary.
  2. Establish a new mechanism for producing standalone Mercurial distributions.
  
  These functions will likely get split up before this commit lands.
  
  If you are familiar with Rust, the contents of the added rust/
  directory should be pretty straightforward. We create an "hgcli"
  package that implements a binary application to run Mercurial.
  The output of this package is an "hg" binary.
  
  Our Rust `hg` creates an embedded CPython interpreter and attempts
  to mimim the current functionality in the `hg` Python script. An
  immediate  goal of the Rust implementation of `hg` is to serve as a
  drop-in replacement for the `hg` script. Longer term, we can discuss
  removing the Python script so `hg` is a native executable on all
  platforms. This will allow us to:
  
  - Merge `chg` functionality into `hg`
  - Start implementing very early `hg` logic in Rust (config parsing, argument parsing, repository opening, etc)
  - Possibly have some commands implemented in 100% Rust, allowing us to avoid Python interpreter startup overhead
  
  The added code for "standalone Mercurial" is a proof-of-concept
  demonstrating the direction I'd like to take Mercurial packaging
  and distribution in the near future. Today, we generally treat
  Mercurial as a Python application. This by-and-large works. But
  as we start writing things in Rust, that will pull us away from
  being Python centric. In addition, the current distribution model
  is fragile because it often relies on the Python that end-users
  have installed. We constantly run into problems with:
  
  - Users running old Python versions with known bugs and lacking features (such as modern SSL/TLS support).
  - Missing optional packages that improve the Mercurial experience (such as re2, pygments, and even the curses module on Windows).
  - Python packaging fragility.
  
  While Python is still a critical component of Mercurial and will be
  for the indefinite future, I'd like Mercurial to pivot away from
  being pictured as a "Python application" and move towards being
  a "generic/system application." In other words, Python is just
  an implementation detail.
  
  The added Python script produces a standalone Mercurial
  installation. It currently only works on Linux, MacOS, and similar
  *NIX variants (although I haven't tested it thoroughly and it may
  be broken on MacOS after recent changes). Essentially, the script
  downloads the Python 2.7 source code, builds Python from source,
  builds the Mercurial Rust and Python components, and produces a
  tarball containing the Rust `hg` binary and all the support files.
  The rpath of the `hg` binary is relative. So you should be able
  to drop the files into any system and things "just work." That's
  the goal anyway: we don't come close to fulfilling this end state.
  
  This patch should be considered early alpha and RFC quality. The
  following known issues and open investigation items exist:
  
  - This is the first Rust code I've written. I'm almost certainly doing many Rust things poorly. I'm pretty sure I'm a bit too liberal with my .unwrap() usage.
  - I haven't verified that the produced standalone Mercurial is actually portable.
  - HGUNICODEPEDANTRY likely doesn't work.
  - rpath values are being hacked in post build. We should set the rpath properly at link time.
  - Sometimes running the Rust `hg` from target/<build>/hg fails because it can't locate Python files. The `build/standalone/bin/hg` binary in the standalone install should "just work."
  - The path searching in the Rust `hg` to find libpython and Mercurial python files is very hacky.
  - Rust `hg` is using std::env::args() instead of std::env::args_os(). I think we want the latter so Rust doesn't panic if it sees arguments that aren't UTF-8.
  - main.rs is re-defining some CPython APIs that are already defined in python27-sys. If you attempt to make python27-sys a dependency on hgcli, Cargo complains that multiple packages are linking to libpython. I didn't feel like going down that rabbit hole.
  - cpython and python27-sys can pick up other Python installs and libraries on the system. I've seen issues with missing PyUnicode_* symbols because of UCS2 and UCS4 mismatch.
  - No support for Windows yet (but I have a plan).
  - The cpython crate seems to be optimized for writing Python extensions, not embedding Python. As a result, the interaction between the Rust `hg` and this crate is a bit wonky. For example, I'm pretty sure we don't need to call `PyEval_InitThreads()`. But the cpython crate insists a thread already exist if the Python interpreter is already initialized. So... ???
  - Python may not be receiving OS signals.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1581

AFFECTED FILES
  .hgignore
  contrib/STANDALONE-MERCURIAL.rst
  contrib/build-standalone.py
  rust/.hgignore
  rust/Cargo.lock
  rust/Cargo.toml
  rust/README.rst
  rust/hgcli/Cargo.toml
  rust/hgcli/build.rs
  rust/hgcli/src/main.rs

CHANGE DETAILS




To: indygreg, #hg-reviewers
Cc: mercurial-devel
phabricator - Dec. 4, 2017, 10:28 p.m.
durin42 added inline comments.

INLINE COMMENTS

> main.rs:130
> +
> +fn main() {
> +    let env = get_environment();

You might be able to usefully avoid all the unwrap() in here by having an extra layer, eg

fn run() -> Result<(), failure::Error> {
...

  let program_name = CString::new(env.python_exe.to_str()?)?.as_ptr();

}

and then

fn main() {

  let mut exit_code;
  match run() {
    Ok(_) => exit_code = 0,
    Err(e) => { println!("{:?}", e); exit_code = 1; },

}

  std::process::exit(exit_code);

}

etc

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1581

To: indygreg, #hg-reviewers
Cc: durin42, dlax, mercurial-devel
phabricator - Dec. 4, 2017, 11:13 p.m.
quark added inline comments.

INLINE COMMENTS

> main.rs:19-31
> +extern "C" {
> +    pub fn Py_SetPythonHome(arg1: *mut c_char);
> +    pub fn Py_SetProgramName(arg1: *const c_char);
> +    pub fn Py_Initialize();
> +    pub fn Py_Finalize();
> +    pub fn PyEval_InitThreads();
> +    // This actually returns a pointer to a struct.

Maybe use `extern crate python27_sys` so part of them get defined.

> main.rs:34-37
> +    _exe : PathBuf,
> +    python_exe : PathBuf,
> +    python_home : PathBuf,
> +    mercurial_modules : PathBuf,

Have you tried `rustfmt`? It will remove the space before `:`.

> main.rs:111
> +    if pedantry {
> +        sys_mod.call(*py, "setdefaultencoding", ("undefined",), None).unwrap();
> +    }

Maybe use `.expect("setdefaultencoding failed")`.

> main.rs:115
> +
> +fn update_modules_path(env : &Environment, py : &Python, sys_mod : &PyModule) {
> +    let sys_path = sys_mod.get(*py, "path").unwrap();

Maybe `py: Python`. That's what `rust-cpython` uses. It implements `Copy`.

> main.rs:120
> +
> +fn run(py : &Python) -> PyResult<()> {
> +    let demand_mod = py.import("hgdemandimport")?;

Probably use `PyResult<int>` since `dispatch.run` returns the exit code.

> main.rs:172
> +            err.print(py);
> +            exit_code = 1;
> +        },

`255` is probably better since that's an uncaught exception.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1581

To: indygreg, #hg-reviewers
Cc: quark, durin42, dlax, mercurial-devel
phabricator - Dec. 5, 2017, 2:40 p.m.
yuja added inline comments.

INLINE COMMENTS

> main.rs:101
> +fn set_python_home(env: &Environment) {
> +    let raw = CString::new(env.python_home.to_str().unwrap())
> +        .unwrap()

Perhaps we'll need a utility function for platform-specific cstr
conversion.

On Unix, the story is simple. We can use OsStrExt. On Windows, 
maybe we'll have to first convert it to "wide characters" by OsStrExt
and then call `WideCharToMultiByte` to convert it back to ANSI bytes sequence.

I have no idea if Rust stdlib provides a proper way to convert
OsStr to Windows ANSI bytes.

https://stackoverflow.com/a/38948854
https://msdn.microsoft.com/en-us/library/windows/desktop/dd374130(v=vs.85).aspx

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1581

To: indygreg, #hg-reviewers
Cc: yuja, quark, durin42, dlax, mercurial-devel
phabricator - Dec. 8, 2017, 7:32 a.m.
indygreg marked 6 inline comments as done.
indygreg added a comment.


  I removed the standalone distribution code and cleaned things up a bit.
  
  Over 97% of the tests pass with the Rust `hg` on Linux. And the failures seem reasonable.
  
  There's a bit of work to be done around packaging, Windows support, Rust linting, etc. But I'd like to get something landed so others have something to play with and so we're not reviewing a massive patch.
  
  What I'm trying to say is "I think this is ready for a real review."

INLINE COMMENTS

> quark wrote in main.rs:120
> Probably use `PyResult<int>` since `dispatch.run` returns the exit code.

``dispatch.run()`` actually calls ``sys.exit()``. There is a bug around somewhere in this code around exit code handling. But it only presents in a few tests, which is odd. I documented that in the commit message.

> yuja wrote in main.rs:101
> Perhaps we'll need a utility function for platform-specific cstr
> conversion.
> 
> On Unix, the story is simple. We can use OsStrExt. On Windows, 
> maybe we'll have to first convert it to "wide characters" by OsStrExt
> and then call `WideCharToMultiByte` to convert it back to ANSI bytes sequence.
> 
> I have no idea if Rust stdlib provides a proper way to convert
> OsStr to Windows ANSI bytes.
> 
> https://stackoverflow.com/a/38948854
> https://msdn.microsoft.com/en-us/library/windows/desktop/dd374130(v=vs.85).aspx

I documented a potential path forward in the code. I was thinking we can address this is a follow-up because it is non-trivial to implement. And we may want to do something funky, such as inject the raw bytes into `mercurial.argv` or something. Since we can call the Windows API directly, we can actually do that now. This opens up a lot of possibilities for encoding...

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1581

To: indygreg, #hg-reviewers
Cc: yuja, quark, durin42, dlax, mercurial-devel
phabricator - Jan. 5, 2018, 7:38 a.m.
yuja requested changes to this revision.
yuja added a comment.
This revision now requires changes to proceed.


  Suppose this is a kind of contrib code, I think it's good enough to accept.
  Can you drop Cargo.lock file?

INLINE COMMENTS

> Cargo.lock:1
> +[[package]]
> +name = "aho-corasick"

Perhaps Cargo.lock should be excluded from the commit.

> build.rs:88
> +
> +static REQUIRED_CONFIG_FLAGS: [&'static str; 2] = ["Py_USING_UNICODE", "WITH_THREAD"];
> +

Nit: `const` ?

> main.rs:62
> +        mercurial_modules: mercurial_modules.to_path_buf(),
> +    }
> +}

Nit: probably we don't have to `clone()` here, just move these values.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1581

To: indygreg, #hg-reviewers, yuja
Cc: yuja, quark, durin42, dlax, mercurial-devel
phabricator - Jan. 10, 2018, 2:07 a.m.
indygreg added inline comments.

INLINE COMMENTS

> yuja wrote in Cargo.lock:1
> Perhaps Cargo.lock should be excluded from the commit.

According to https://doc.rust-lang.org/cargo/faq.html#why-do-binaries-have-cargolock-in-version-control-but-not-libraries (and other places I found on the Internet), `Cargo.lock` files under version control are a good practice for binaries - but not libraries.

Since this `Cargo.lock` encompasses the `hgcli` binary, I think that counts.

FWIW, Firefox vendors `Cargo.lock` files for libraries. Although those libraries are attached to C++ binaries, so there isn't an appropriate Rust binary to attach to. But a main purpose of `Cargo.lock` files is to add determinism. For that reason, I think we want them vendored.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1581

To: indygreg, #hg-reviewers, yuja
Cc: yuja, quark, durin42, dlax, mercurial-devel
phabricator - Jan. 10, 2018, 1:06 p.m.
yuja added inline comments.

INLINE COMMENTS

> indygreg wrote in Cargo.lock:1
> According to https://doc.rust-lang.org/cargo/faq.html#why-do-binaries-have-cargolock-in-version-control-but-not-libraries (and other places I found on the Internet), `Cargo.lock` files under version control are a good practice for binaries - but not libraries.
> 
> Since this `Cargo.lock` encompasses the `hgcli` binary, I think that counts.
> 
> FWIW, Firefox vendors `Cargo.lock` files for libraries. Although those libraries are attached to C++ binaries, so there isn't an appropriate Rust binary to attach to. But a main purpose of `Cargo.lock` files is to add determinism. For that reason, I think we want them vendored.

> Cargo.lock files under version control are a good practice for binaries

Okay, I didn't know that, thanks. (FWIW, I don't like the idea of version-controlling
strict build dependencies, but let's just follow their rule.)

Can we move Cargo.lock under hgcli directory, then? Sounds like we'll need a
separate Cargo.lock file for "libraries".

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1581

To: indygreg, #hg-reviewers, yuja
Cc: yuja, quark, durin42, dlax, mercurial-devel
phabricator - Jan. 10, 2018, 7:32 p.m.
cramertj added a comment.


  Looks pretty good to me-- left a couple nits inline.
  
  There are a lot of uses of `unwrap`, `expect`, and `panic` that can (and probably should) be replaced with proper error handling using `Result` (and the `failure` crate).
  
  There are also a couple of crates that provide safe bindings to Python interpreters-- I'm not sure what your external dependency situation is, but you might consider using something like https://crates.io/crates/pyo3 rather than writing your own `unsafe` calls to the python interpreter.

INLINE COMMENTS

> build.rs:12
> +#[cfg(target_os = "windows")]
> +use std::path::PathBuf;
> +

Nit: if you move this import into `have_shared`, you'll only need one `cfg` and it'll be easier to validate that the proper deps are available for the proper platforms.

> build.rs:88
> +
> +const REQUIRED_CONFIG_FLAGS: [&'static str; 2] = ["Py_USING_UNICODE", "WITH_THREAD"];
> +

Nit: not sure what version you're targeting, but `'static` is automatic for `const` vars, so you could write `[&str; 2]`

> main.rs:45
> +
> +    let python_exe: &'static str = env!("PYTHON_INTERPRETER");
> +    let python_exe = PathBuf::from(python_exe);

Nit: you can just write `&str`. Also, I'm not familiar with what you're trying to do here, but is the PYTHON_INTERPRETER always determined at compile-time? It seems like something you might want to switch on at runtime. Is that not the case?

> main.rs:125
> +
> +    // Set program name. The backing memory needs to live for the duration of the
> +    // interpreter.

If it needs to live for the whole time, consider storing it in a `static` or similar. There's a subtle unsafety here: if this program panics (due to `panic`, `unwrap`, or `expect`, `program_name` will be dropped (free'd) before the python interpreter is killed (when the process ends, that is-- `Finalize` won't ever be called in that case). I don't know how much of an issue this will be in practice, but it's something to think about.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1581

To: indygreg, #hg-reviewers, yuja
Cc: cramertj, yuja, quark, durin42, dlax, mercurial-devel
phabricator - Jan. 10, 2018, 9:46 p.m.
indygreg added a comment.


  In https://phab.mercurial-scm.org/D1581#31029, @cramertj wrote:
  
  > There are a lot of uses of `unwrap`, `expect`, and `panic` that can (and probably should) be replaced with proper error handling using `Result` (and the `failure` crate).
  
  
  I definitely agree. I still consider this just beyond proof-of-concept code. We'll definitely want to shore things up before we ship. Perfect is very much the enemy of good at this stage.
  
  > There are also a couple of crates that provide safe bindings to Python interpreters-- I'm not sure what your external dependency situation is, but you might consider using something like https://crates.io/crates/pyo3 rather than writing your own `unsafe` calls to the python interpreter.
  
  pyo3 requires non-stable Rust features last I checked. That makes it a non-starter for us at this time (since downstream packagers will insist on only using stable Rust).
  
  If other external dependencies provide the interfaces we need, I'm open to taking those dependencies. But this crate is focused on embedding a Python interpreter. Most (all?) of the Rust+Python crates I found seemed to target the "implementing Python extensions with Rust" use case, not embedding Python. As such, their embedding API support is very lacking. I even had to fork rust-cpython because it didn't implement the proper APIs and is forcing its extension-centric religion on consumers. I've upstreamed most of my modifications though. So hopefully the fork doesn't live much longer...

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1581

To: indygreg, #hg-reviewers, yuja
Cc: cramertj, yuja, quark, durin42, dlax, mercurial-devel
phabricator - Jan. 10, 2018, 9:49 p.m.
cramertj added a comment.


  In https://phab.mercurial-scm.org/D1581#31035, @indygreg wrote:
  
  > In https://phab.mercurial-scm.org/D1581#31029, @cramertj wrote:
  >
  > > There are a lot of uses of `unwrap`, `expect`, and `panic` that can (and probably should) be replaced with proper error handling using `Result` (and the `failure` crate).
  >
  >
  > I definitely agree. I still consider this just beyond proof-of-concept code. We'll definitely want to shore things up before we ship. Perfect is very much the enemy of good at this stage.
  >
  > > There are also a couple of crates that provide safe bindings to Python interpreters-- I'm not sure what your external dependency situation is, but you might consider using something like https://crates.io/crates/pyo3 rather than writing your own `unsafe` calls to the python interpreter.
  >
  > pyo3 requires non-stable Rust features last I checked. That makes it a non-starter for us at this time (since downstream packagers will insist on only using stable Rust).
  >
  > If other external dependencies provide the interfaces we need, I'm open to taking those dependencies. But this crate is focused on embedding a Python interpreter. Most (all?) of the Rust+Python crates I found seemed to target the "implementing Python extensions with Rust" use case, not embedding Python. As such, their embedding API support is very lacking. I even had to fork rust-cpython because it didn't implement the proper APIs and is forcing its extension-centric religion on consumers. I've upstreamed most of my modifications though. So hopefully the fork doesn't live much longer...
  
  
  SGTM! Thanks for clarifying.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1581

To: indygreg, #hg-reviewers, yuja
Cc: cramertj, yuja, quark, durin42, dlax, mercurial-devel
phabricator - Jan. 11, 2018, 3:41 a.m.
indygreg marked 7 inline comments as done.
indygreg added a comment.


  I'll address this and other review feedback in follow-up patches.
  
  Thanks for the review!

INLINE COMMENTS

> cramertj wrote in main.rs:45
> Nit: you can just write `&str`. Also, I'm not familiar with what you're trying to do here, but is the PYTHON_INTERPRETER always determined at compile-time? It seems like something you might want to switch on at runtime. Is that not the case?

This is meant to be dynamic. The gist of this code is we're trying to find the location of the Python install given various search strategies. The search strategy is (currently) defined at compile time. And this `localdev` search strategy defines the path to Python at compile time. Look for my features and documentation for this code later. (I stripped out unused code from this patch to make it smaller.)

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1581

To: indygreg, #hg-reviewers, yuja, durin42
Cc: cramertj, yuja, quark, durin42, dlax, mercurial-devel
phabricator - Jan. 11, 2018, 11:01 a.m.
kevincox added a comment.


  Overall it looks great. I added a bunch of nitpicks. The most common suggestion was to add some contextual information to error messages. Other then that mostly minor style improvements. Feel free to push back on anything you don't agree with :)

INLINE COMMENTS

> cramertj wrote in build.rs:88
> Nit: not sure what version you're targeting, but `'static` is automatic for `const` vars, so you could write `[&str; 2]`

I would also recommend using a slice if you don't intend the size of the array to be part of the type signature.

  const REQUIRED_CONFIG_FLAGS: &[&str] = &["Py_USING_UNICODE", "WITH_THREAD"];

> build.rs:33
> +        );
> +    }
> +

assert!(
      Path::new(&python).exists(),
      "Python interpreter {} does not exist; this should never happen",
      python);

I would also recommend `{:?}` as the quotes highlight the injected variable nicely and it will make some hard-to-notice things more visible due to escaping.

> build.rs:110
> +        if !result {
> +            panic!("Detected Python requires feature {}", key);
> +        }

Use `assert!`.

> build.rs:116
> +    if !have_shared(&config) {
> +        panic!("Detected Python lacks a shared library, which is required");
> +    }

Use `assert!`

> build.rs:127
> +        panic!("Detected Python doesn't support UCS-4 code points");
> +    }
> +}

#[cfg(not(target_os = "windows"))]
  assert_eq!(
      config.config.get("Py_UNICODE_SIZE"), Some("4"),
       "Detected Python doesn't support UCS-4 code points");

> main.rs:37
> +fn get_environment() -> Environment {
> +    let exe = env::current_exe().unwrap();
> +

Use expect for a better error message. `.expect("Error getting executable path")`

> main.rs:91
> +        .unwrap()
> +        .into_raw();
> +    unsafe {

This method allows paths that aren't valid UTF-8 by avoiding ever becoming a `str`.

  CString::new(env.python_home.as_ref().as_bytes())

I would also change the unwrap to `.expect("Error setting python home")`.

> main.rs:99
> +    // Call sys.setdefaultencoding("undefined") if HGUNICODEPEDANTRY is set.
> +    let pedantry = env::var("HGUNICODEPEDANTRY").is_ok();
> +

It appears that HG accepts `HGUNICODEPEDANTRY=` as not enabling unicode pedantry. Maybe the behavior should be the same here. Untested code below should work.

  let pedantry = !env::var("HGUNICODEPEDANTRY").unwrap_or("").is_empty();

> main.rs:111
> +fn update_modules_path(env: &Environment, py: Python, sys_mod: &PyModule) {
> +    let sys_path = sys_mod.get(py, "path").unwrap();
> +    sys_path

`.expect("Error accessing sys.path")`

> main.rs:133
> +    // not desirable.
> +    let program_name = CString::new(env.python_exe.to_str().unwrap())
> +        .unwrap()

`CString::new(env.python_exe.as_ref().as_bytes())`

> main.rs:134
> +    let program_name = CString::new(env.python_exe.to_str().unwrap())
> +        .unwrap()
> +        .as_ptr();

`.expect("Error setting program name")`

> main.rs:200
> +fn run_py(env: &Environment, py: Python) -> PyResult<()> {
> +    let sys_mod = py.import("sys").unwrap();
> +

Since this method returns a Result why not handle the error?

  let sys_mod = py.import("sys")?;

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1581

To: indygreg, #hg-reviewers, yuja, durin42
Cc: kevincox, cramertj, yuja, quark, durin42, dlax, mercurial-devel
phabricator - Jan. 11, 2018, 12:31 p.m.
yuja added inline comments.

INLINE COMMENTS

> kevincox wrote in main.rs:133
> `CString::new(env.python_exe.as_ref().as_bytes())`

That's more correct on Unix, but wouldn't work on Windows since 
native string is UTF-16 variant (Rust) or ANSI (Python 2.7).

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1581

To: indygreg, #hg-reviewers, yuja, durin42
Cc: kevincox, cramertj, yuja, quark, durin42, dlax, mercurial-devel
phabricator - Jan. 11, 2018, 1:09 p.m.
kevincox added inline comments.

INLINE COMMENTS

> yuja wrote in main.rs:133
> That's more correct on Unix, but wouldn't work on Windows since 
> native string is UTF-16 variant (Rust) or ANSI (Python 2.7).

Oops, for some reason I thought this was in a unix block. `as_bytes()` isn't actually available otherise. I think that this is fine then. non-UTF-8 paths shouldn't be a major issue..

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1581

To: indygreg, #hg-reviewers, yuja, durin42
Cc: kevincox, cramertj, yuja, quark, durin42, dlax, mercurial-devel
phabricator - Jan. 11, 2018, 5:09 p.m.
ruuda added a comment.


  Awesome! I have just a few things that could be written more briefly.

INLINE COMMENTS

> build.rs:39
> +    let script = "import sysconfig; \
> +c = sysconfig.get_config_vars(); \
> +print('SEPARATOR STRING'.join('%s=%s' % i for i in c.items()))";

You can safely indent those, leading whitespace on the continuation line will be stripped from the string literal.

> build.rs:107
> +            None => false,
> +        };
> +

There is no need to match:

  let result = config.config.get(*key) == Some("1");

Or using assert as @kevincox recommends:

  assert_eq!(config.config.get(*key), Some("1"), "Detected ...");

> main.rs:187
> +                Err(255)
> +            }
> +            Ok(()) => Ok(()),

There exists `Result::map_err` for this:

  result = run_py(&env, py).map_err(|err| {
      err.print(py);
      255
  });

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1581

To: indygreg, #hg-reviewers, yuja, durin42
Cc: ruuda, kevincox, cramertj, yuja, quark, durin42, dlax, mercurial-devel

Patch

diff --git a/rust/hgcli/src/main.rs b/rust/hgcli/src/main.rs
new file mode 100644
--- /dev/null
+++ b/rust/hgcli/src/main.rs
@@ -0,0 +1,182 @@ 
+// main.rs -- Main routines for `hg` program
+//
+// Copyright 2017 Gregory Szorc <gregory.szorc@gmail.com>
+//
+// This software may be used and distributed according to the terms of the
+// GNU General Public License version 2 or any later version.
+
+extern crate libc;
+extern crate cpython;
+extern crate which;
+
+use cpython::{NoArgs, ObjectProtocol, PyModule, PyResult, Python};
+use libc::{c_char, c_int};
+
+use std::env;
+use std::path::PathBuf;
+use std::ffi::CString;
+
+extern "C" {
+    pub fn Py_SetPythonHome(arg1: *mut c_char);
+    pub fn Py_SetProgramName(arg1: *const c_char);
+    pub fn Py_Initialize();
+    pub fn Py_Finalize();
+    pub fn PyEval_InitThreads();
+    // This actually returns a pointer to a struct.
+    pub fn PyEval_SaveThread() -> *mut c_char;
+    pub fn PySys_SetArgv(arg1: c_int, arg2: *const *const c_char) -> ();
+    pub fn PySys_SetArgvEx(arg1: c_int, arg2: *const *const c_char,
+                           arg3: c_int) -> ();
+}
+
+#[derive(Debug)]
+struct Environment {
+    _exe : PathBuf,
+    python_exe : PathBuf,
+    python_home : PathBuf,
+    mercurial_modules : PathBuf,
+}
+
+// TODO we probably want to customize the behavior of this function
+// based on cargo features/config.
+fn get_environment() -> Environment {
+    let exe_buf = env::current_exe().unwrap();
+
+    // Standalone layout is:
+    // bin/hg
+    // hgpython/bin/python2.7
+    // hgpython/lib/libpython2.7.so
+    let mut standalone_root = exe_buf.clone();
+    standalone_root.pop();
+    standalone_root.pop();
+
+    let mut standalone_python_exe = standalone_root.clone();
+    standalone_python_exe.push("hgpython");
+    standalone_python_exe.push("bin");
+    standalone_python_exe.push("python2.7");
+
+    let (is_standalone, python_exe) = match standalone_python_exe.exists() {
+        true => (true, standalone_python_exe),
+        // TODO handle failure gracefully.
+        false => (false, which::which("python2.7").unwrap()),
+    };
+
+    let mut python_home = python_exe.clone();
+    python_home.pop();
+    python_home.pop();
+
+    let mercurial_modules = if is_standalone {
+        let mut p = standalone_root.clone();
+        p.push("mercurial");
+        p
+    } else {
+        // rust/target/<build>/hg
+        let mut p = exe_buf.clone();
+        p.pop();
+        p.pop();
+        p.pop();
+        p.pop();
+
+        p.push("mercurial");
+        if !p.exists() {
+            panic!("could not find Mercurial modules");
+        }
+        p.pop();
+
+        p
+    };
+
+    Environment {
+        _exe: exe_buf.clone(),
+        python_exe: python_exe.clone(),
+        python_home: python_home.clone(),
+        mercurial_modules : mercurial_modules.clone(),
+    }
+}
+
+fn set_python_home(env : &Environment) {
+    let raw = CString::new(env.python_home.to_str().unwrap()).unwrap().into_raw();
+    unsafe {
+        Py_SetPythonHome(raw);
+    }
+}
+
+fn update_encoding(py : &Python, sys_mod : &PyModule) {
+    // Call sys.setdefaultencoding("undefined") if HGUNICODEPEDANTRY is set.
+    let pedantry = env::var("HGUNICODEPEDANTRY").is_ok();
+
+    // TODO do we need to call reload(sys) here? Should we set Python encoding
+    // before we start Python interpreter?
+    if pedantry {
+        sys_mod.call(*py, "setdefaultencoding", ("undefined",), None).unwrap();
+    }
+}
+
+fn update_modules_path(env : &Environment, py : &Python, sys_mod : &PyModule) {
+    let sys_path = sys_mod.get(*py, "path").unwrap();
+    sys_path.call_method(*py, "insert", (0, env.mercurial_modules.to_str()), None).unwrap();
+}
+
+fn run(py : &Python) -> PyResult<()> {
+    let demand_mod = py.import("hgdemandimport")?;
+    demand_mod.call(*py, "enable", NoArgs, None)?;
+
+    let dispatch_mod = py.import("mercurial.dispatch")?;
+    dispatch_mod.call(*py, "run", NoArgs, None)?;
+
+    Ok(())
+}
+
+fn main() {
+    let env = get_environment();
+
+    //println!("{:?}", env);
+
+    // Tell Python where it is installed.
+    set_python_home(&env);
+
+    // Set program name. The backing memory needs to live for the duration of the
+    // interpreter.
+    let program_name = CString::new(env.python_exe.to_str().unwrap()).unwrap().as_ptr();
+    unsafe {
+        Py_SetProgramName(program_name);
+    }
+
+    // TODO https://docs.python.org/2/c-api/init.html#c.PySys_SetArgvEx says
+    // 1. We may wish to not update sys.path as part of setting args
+    // 2. Initial argument should be empty string since we're not executing a Python script
+    let args : Vec<CString> = env::args().map(|s| CString::new(s).unwrap()).collect();
+    let argv : Vec<*const c_char> = args.iter().map(|a| a.as_ptr()).collect();
+
+    unsafe {
+        Py_Initialize();
+        PySys_SetArgv(args.len() as c_int,
+                      argv.as_ptr());
+        PyEval_InitThreads();
+        let _thread_state = PyEval_SaveThread();
+    }
+
+    let gil = Python::acquire_gil();
+    let py = gil.python();
+
+    let sys_mod = py.import("sys").unwrap();
+
+    update_encoding(&py, &sys_mod);
+    update_modules_path(&env, &py, &sys_mod);
+
+    let mut exit_code : i32 = 0;
+
+    match run(&py) {
+        Err(err) => {
+            err.print(py);
+            exit_code = 1;
+        },
+        Ok(()) => (),
+    };
+
+    unsafe {
+        Py_Finalize();
+    }
+
+    std::process::exit(exit_code);
+}
diff --git a/rust/hgcli/build.rs b/rust/hgcli/build.rs
new file mode 100644
--- /dev/null
+++ b/rust/hgcli/build.rs
@@ -0,0 +1,77 @@ 
+// build.rs -- Configure build environment for `hgcli` Rust package.
+//
+// Copyright 2017 Gregory Szorc <gregory.szorc@gmail.com>
+//
+// This software may be used and distributed according to the terms of the
+// GNU General Public License version 2 or any later version.
+
+use std::collections::HashMap;
+use std::env;
+use std::process::Command;
+
+struct PythonConfig {
+    python : String,
+    config : HashMap<String, String>,
+}
+
+fn get_python_config() -> PythonConfig {
+    let python = match env::var_os("PYTHON_SYS_EXECUTABLE") {
+        Some(path) => path.into_string().unwrap(),
+        None => String::from("python2.7"),
+    };
+
+    let separator = "SEPARATOR STRING";
+
+    let script = "import sysconfig; \
+c = sysconfig.get_config_vars(); \
+print('SEPARATOR STRING'.join('%s=%s' % i for i in c.items()))";
+
+    let mut command = Command::new(&python);
+    command.arg("-c").arg(script);
+
+    let out = command.output().unwrap();
+
+    if !out.status.success() {
+        panic!("python script failed: {}", String::from_utf8_lossy(&out.stderr));
+    }
+
+    let stdout = String::from_utf8_lossy(&out.stdout);
+    let mut m = HashMap::new();
+
+    for entry in stdout.split(separator) {
+        let mut parts = entry.splitn(2, "=");
+        let key = parts.next().unwrap();
+        let value = parts.next().unwrap();
+        m.insert(String::from(key), String::from(value));
+    }
+
+    PythonConfig {
+        python: python,
+        config: m,
+    }
+}
+
+fn main() {
+    let config = get_python_config();
+
+    println!("Using Python: {}", config.python);
+
+    let prefix = config.config.get("prefix").unwrap();
+
+    println!("Prefix: {}", prefix);
+
+    let enable_shared = match config.config.get("Py_ENABLE_SHARED") {
+        Some(value) => value == "1",
+        None => false,
+    };
+
+    if !enable_shared {
+        panic!("Detected Python lacks a shared library, which is required");
+    }
+
+    // If building standalone Mercurial, add an extra link path for
+    // native libraries.
+    if let Some(lib_path) = env::var_os("HG_STANDALONE_LINK_PATH") {
+        println!("cargo:rustc-link-search=native={}", lib_path.to_str().unwrap());
+    }
+}
diff --git a/rust/hgcli/Cargo.toml b/rust/hgcli/Cargo.toml
new file mode 100644
--- /dev/null
+++ b/rust/hgcli/Cargo.toml
@@ -0,0 +1,21 @@ 
+[package]
+name = "hgcli"
+version = "0.1.0"
+authors = ["Gregory Szorc <gregory.szorc@gmail.com>"]
+
+build = "build.rs"
+
+[[bin]]
+name = "hg"
+path = "src/main.rs"
+
+[dependencies]
+libc = "0.2.34"
+which = "1.0.3"
+
+[dependencies.cpython]
+version = "0.1"
+default-features = false
+features = ["python27-sys"]
+git = "https://github.com/dgrunwald/rust-cpython.git"
+rev = "b35031e2670d7571f03c313cde8fd91105bd5322"
diff --git a/rust/README.rst b/rust/README.rst
new file mode 100644
--- /dev/null
+++ b/rust/README.rst
@@ -0,0 +1,32 @@ 
+===================
+Mercurial Rust Code
+===================
+
+This directory contains various Rust code for the Mercurial project.
+
+The top-level ``Cargo.toml`` file defines a workspace containing
+all primary Mercurial crates.
+
+Building
+========
+
+To build the Rust components::
+
+   $ cargo build
+
+If you prefer a non-debug / release configuration::
+
+   $ cargo build --release
+
+Running
+=======
+
+The ``hgcli`` crate produces an ``hg`` binary. You can run this binary
+via ``cargo run``::
+
+   $ cargo run --manifest-path hgcli/Cargo.toml
+
+Or directly::
+
+   $ target/debug/hg
+   $ target/release/hg
diff --git a/rust/Cargo.toml b/rust/Cargo.toml
new file mode 100644
--- /dev/null
+++ b/rust/Cargo.toml
@@ -0,0 +1,2 @@ 
+[workspace]
+members = ["hgcli"]
diff --git a/rust/Cargo.lock b/rust/Cargo.lock
new file mode 100644
--- /dev/null
+++ b/rust/Cargo.lock
@@ -0,0 +1,136 @@ 
+[[package]]
+name = "aho-corasick"
+version = "0.5.3"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+dependencies = [
+ "memchr 0.1.11 (registry+https://github.com/rust-lang/crates.io-index)",
+]
+
+[[package]]
+name = "cpython"
+version = "0.1.0"
+source = "git+https://github.com/dgrunwald/rust-cpython.git?rev=b35031e2670d7571f03c313cde8fd91105bd5322#b35031e2670d7571f03c313cde8fd91105bd5322"
+dependencies = [
+ "libc 0.2.34 (registry+https://github.com/rust-lang/crates.io-index)",
+ "num-traits 0.1.41 (registry+https://github.com/rust-lang/crates.io-index)",
+ "python27-sys 0.1.2 (git+https://github.com/dgrunwald/rust-cpython.git?rev=b35031e2670d7571f03c313cde8fd91105bd5322)",
+]
+
+[[package]]
+name = "hgcli"
+version = "0.1.0"
+dependencies = [
+ "cpython 0.1.0 (git+https://github.com/dgrunwald/rust-cpython.git?rev=b35031e2670d7571f03c313cde8fd91105bd5322)",
+ "libc 0.2.34 (registry+https://github.com/rust-lang/crates.io-index)",
+ "which 1.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
+]
+
+[[package]]
+name = "kernel32-sys"
+version = "0.2.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+dependencies = [
+ "winapi 0.2.8 (registry+https://github.com/rust-lang/crates.io-index)",
+ "winapi-build 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)",
+]
+
+[[package]]
+name = "libc"
+version = "0.2.34"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+
+[[package]]
+name = "memchr"
+version = "0.1.11"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+dependencies = [
+ "libc 0.2.34 (registry+https://github.com/rust-lang/crates.io-index)",
+]
+
+[[package]]
+name = "num-traits"
+version = "0.1.41"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+
+[[package]]
+name = "python27-sys"
+version = "0.1.2"
+source = "git+https://github.com/dgrunwald/rust-cpython.git?rev=b35031e2670d7571f03c313cde8fd91105bd5322#b35031e2670d7571f03c313cde8fd91105bd5322"
+dependencies = [
+ "libc 0.2.34 (registry+https://github.com/rust-lang/crates.io-index)",
+ "regex 0.1.80 (registry+https://github.com/rust-lang/crates.io-index)",
+]
+
+[[package]]
+name = "regex"
+version = "0.1.80"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+dependencies = [
+ "aho-corasick 0.5.3 (registry+https://github.com/rust-lang/crates.io-index)",
+ "memchr 0.1.11 (registry+https://github.com/rust-lang/crates.io-index)",
+ "regex-syntax 0.3.9 (registry+https://github.com/rust-lang/crates.io-index)",
+ "thread_local 0.2.7 (registry+https://github.com/rust-lang/crates.io-index)",
+ "utf8-ranges 0.1.3 (registry+https://github.com/rust-lang/crates.io-index)",
+]
+
+[[package]]
+name = "regex-syntax"
+version = "0.3.9"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+
+[[package]]
+name = "thread-id"
+version = "2.0.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+dependencies = [
+ "kernel32-sys 0.2.2 (registry+https://github.com/rust-lang/crates.io-index)",
+ "libc 0.2.34 (registry+https://github.com/rust-lang/crates.io-index)",
+]
+
+[[package]]
+name = "thread_local"
+version = "0.2.7"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+dependencies = [
+ "thread-id 2.0.0 (registry+https://github.com/rust-lang/crates.io-index)",
+]
+
+[[package]]
+name = "utf8-ranges"
+version = "0.1.3"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+
+[[package]]
+name = "which"
+version = "1.0.3"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+dependencies = [
+ "libc 0.2.34 (registry+https://github.com/rust-lang/crates.io-index)",
+]
+
+[[package]]
+name = "winapi"
+version = "0.2.8"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+
+[[package]]
+name = "winapi-build"
+version = "0.1.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+
+[metadata]
+"checksum aho-corasick 0.5.3 (registry+https://github.com/rust-lang/crates.io-index)" = "ca972c2ea5f742bfce5687b9aef75506a764f61d37f8f649047846a9686ddb66"
+"checksum cpython 0.1.0 (git+https://github.com/dgrunwald/rust-cpython.git?rev=b35031e2670d7571f03c313cde8fd91105bd5322)" = "<none>"
+"checksum kernel32-sys 0.2.2 (registry+https://github.com/rust-lang/crates.io-index)" = "7507624b29483431c0ba2d82aece8ca6cdba9382bff4ddd0f7490560c056098d"
+"checksum libc 0.2.34 (registry+https://github.com/rust-lang/crates.io-index)" = "36fbc8a8929c632868295d0178dd8f63fc423fd7537ad0738372bd010b3ac9b0"
+"checksum memchr 0.1.11 (registry+https://github.com/rust-lang/crates.io-index)" = "d8b629fb514376c675b98c1421e80b151d3817ac42d7c667717d282761418d20"
+"checksum num-traits 0.1.41 (registry+https://github.com/rust-lang/crates.io-index)" = "cacfcab5eb48250ee7d0c7896b51a2c5eec99c1feea5f32025635f5ae4b00070"
+"checksum python27-sys 0.1.2 (git+https://github.com/dgrunwald/rust-cpython.git?rev=b35031e2670d7571f03c313cde8fd91105bd5322)" = "<none>"
+"checksum regex 0.1.80 (registry+https://github.com/rust-lang/crates.io-index)" = "4fd4ace6a8cf7860714a2c2280d6c1f7e6a413486c13298bbc86fd3da019402f"
+"checksum regex-syntax 0.3.9 (registry+https://github.com/rust-lang/crates.io-index)" = "f9ec002c35e86791825ed294b50008eea9ddfc8def4420124fbc6b08db834957"
+"checksum thread-id 2.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "a9539db560102d1cef46b8b78ce737ff0bb64e7e18d35b2a5688f7d097d0ff03"
+"checksum thread_local 0.2.7 (registry+https://github.com/rust-lang/crates.io-index)" = "8576dbbfcaef9641452d5cf0df9b0e7eeab7694956dd33bb61515fb8f18cfdd5"
+"checksum utf8-ranges 0.1.3 (registry+https://github.com/rust-lang/crates.io-index)" = "a1ca13c08c41c9c3e04224ed9ff80461d97e121589ff27c753a16cb10830ae0f"
+"checksum which 1.0.3 (registry+https://github.com/rust-lang/crates.io-index)" = "4be6cfa54dab45266e98b5d7be2f8ce959ddd49abd141a05d52dce4b07f803bb"
+"checksum winapi 0.2.8 (registry+https://github.com/rust-lang/crates.io-index)" = "167dc9d6949a9b857f3451275e911c3f44255842c1f7a76f33c55103a909087a"
+"checksum winapi-build 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "2d315eee3b34aca4797b2da6b13ed88266e6d612562a0c46390af8299fc699bc"
diff --git a/rust/.hgignore b/rust/.hgignore
new file mode 100644
--- /dev/null
+++ b/rust/.hgignore
@@ -0,0 +1 @@ 
+target/
diff --git a/contrib/build-standalone.py b/contrib/build-standalone.py
new file mode 100755
--- /dev/null
+++ b/contrib/build-standalone.py
@@ -0,0 +1,390 @@ 
+#!/usr/bin/env python2.7
+# build-standalone.py - Create a standalone distribution of Mercurial.
+#
+# Copyright 2017 Gregory Szorc <gregory.szorc@gmail.com>
+#
+# This software may be used and distributed according to the terms of the
+# GNU General Public License version 2 or any later version.
+
+"""Create a standalone Mercurial distribution.
+
+This script does the bulk of the work for creating a standalone Mercurial
+distribution.
+"""
+
+import errno
+import gzip
+import hashlib
+import io
+import multiprocessing
+import os
+import shutil
+import stat
+import subprocess
+import sys
+import tarfile
+import tempfile
+import urllib2
+
+try:
+    import lzma
+except ImportError:
+    lzma = None
+
+
+PYTHON_ARCHIVES = {
+    'version': '2.7.14',
+    'url': 'https://www.python.org/ftp/python/{version}/{prefix}-{version}.{suffix}',
+    'gz': {
+        'sha256': '304c9b202ea6fbd0a4a8e0ad3733715fbd4749f2204a9173a58ec53c32ea73e8',
+        'prefix': 'Python',
+        'suffix': 'tgz',
+        'tar_mode': 'r:gz',
+    },
+    'xz': {
+        'sha256': '71ffb26e09e78650e424929b2b457b9c912ac216576e6bd9e7d204ed03296a66',
+        'prefix': 'Python',
+        'suffix': 'xz',
+        'tar_mode': 'r:xz',
+    },
+    'msi32': {
+        'sha256': '450bde0540341d4f7a6ad2bb66639fd3fac1c53087e9844dc34ddf88057a17ca',
+        'prefix': 'python',
+        'suffix': 'msi',
+    },
+    'msi64': {
+        'sha256': 'af293df7728b861648162ba0cd4a067299385cb6a3f172569205ac0b33190693',
+        'prefix': 'python',
+        'suffix': 'amd64.msi',
+    }
+}
+
+
+def hash_file(fh):
+    hasher = hashlib.sha256()
+    while True:
+        chunk = fh.read(16384)
+        if not chunk:
+            break
+
+        hasher.update(chunk)
+
+    return hasher.hexdigest()
+
+
+def makedirs(path):
+    try:
+        os.makedirs(path)
+    except OSError as e:
+        if e.errno != errno.EEXIST:
+            raise
+
+
+def _ensure_python_source(dest_dir):
+    """Ensure the Python source code is extracted to a path."""
+    makedirs(dest_dir)
+
+    if lzma:
+        archive = PYTHON_ARCHIVES['xz']
+    else:
+        archive = PYTHON_ARCHIVES['gz']
+
+    archive_path = os.path.join(dest_dir,
+                                'python-%s.%s' % (PYTHON_ARCHIVES['version'],
+                                                  archive['suffix']))
+
+    if os.path.exists(archive_path):
+        with open(archive_path, 'rb') as fh:
+            if hash_file(fh) != archive['sha256']:
+                print('%s has unexpected hash; removing' % archive_path)
+                os.unlink(archive_path)
+
+    if not os.path.exists(archive_path):
+        url = PYTHON_ARCHIVES['url'].format(
+            version=PYTHON_ARCHIVES['version'],
+            prefix=archive['prefix'],
+            suffix=archive['suffix'])
+
+        print('downloading %s' % url)
+
+        req = urllib2.urlopen(url)
+        if req.getcode() != 200:
+            raise Exception('non-200 HTTP response downloading Python: %d' % req.getcode())
+
+        buf = io.BytesIO()
+        while True:
+            chunk = req.read(16384)
+            if not chunk:
+                break
+            buf.write(chunk)
+
+        buf.seek(0)
+        if hash_file(buf) != archive['sha256']:
+            raise Exception('Python hash mismatch')
+
+        buf.seek(0)
+        with open(archive_path, 'wb') as fh:
+            fh.write(buf.getvalue())
+
+    # Assume if a single file from the archive is present that we don't need
+    # to re-extract.
+    if os.path.exists(os.path.join(dest_dir, 'configure')):
+        print('extracted python source code found; using without modifications')
+        return
+
+    print('extracting %s to %s' % (archive_path, dest_dir))
+    with tarfile.open(archive_path, archive['tar_mode']) as tf:
+        prefix = 'Python-%s' % PYTHON_ARCHIVES['version']
+        for ti in tf:
+            assert ti.name.startswith(prefix)
+            ti.name = ti.name[len(prefix):].lstrip('/')
+            tf.extract(ti, dest_dir)
+
+
+def _build_python(state):
+    source_dir = state['python_source_dir']
+    build_dir = state['python_build_dir']
+    _ensure_python_source(source_dir)
+
+    makedirs(build_dir)
+
+    # TODO use a more sensible filesystem layout for Python in cases
+    # where the files will be installed alongside other system files
+    # (e.g. when producing deb or rpm archives).
+    if not os.path.exists(os.path.join(build_dir, 'config.status')):
+        subprocess.check_call([
+            os.path.join(source_dir, 'configure'),
+            '--prefix', '/hgpython',
+            '--enable-shared',
+            '--enable-unicode=ucs4',
+            # TODO enable optimizations
+            # '--enable-optimizations',
+            # '--enable-lto',
+        ], cwd=build_dir)
+
+    subprocess.check_call([
+        'make', '-j%d' % multiprocessing.cpu_count(),
+    ], cwd=build_dir)
+
+
+def install_python(state):
+    """Installs Python in the standalone directory.
+
+    Python is installed to the `hgpython/` sub-directory. The layout of
+    this directory resembles a typical Python distribution. In fact, the
+    Python installation could be used on its own, just like any other
+    Python installation.
+    """
+    # TODO on Windows, obtain Python files from official, self-contained
+    # binary distribution (via an MSI).
+    _build_python(state)
+
+    build_dir = state['python_build_dir']
+    py_dir = state['python_install_dir']
+
+    if os.path.exists(os.path.join(py_dir, 'bin', 'python')):
+        print('python already installed in %s; skipping `make install`' %
+              py_dir)
+    else:
+        subprocess.check_call([
+            'make',
+            '-j%d' % multiprocessing.cpu_count(),
+            'install',
+            'DESTDIR=%s' % state['install_dir'],
+        ], cwd=build_dir)
+
+    # Update shared library references to be relative to binary.
+    # TODO compile Python in such a way that this isn't necessary.
+    if sys.platform.startswith('linux'):
+        subprocess.check_call([
+            'patchelf',
+            '--set-rpath',
+            '$ORIGIN/../lib',
+            state['python_bin'],
+        ])
+    elif sys.platform == 'darwin':
+        subprocess.check_call([
+            'install_name_tool', '-change',
+            '/hgpython/lib/libpython2.7.dylib',
+            '@loader_path/../lib/libpython2.7.dylib',
+            state['python_bin'],
+        ])
+
+
+def install_rust_components(state):
+    rust_dir = os.path.join(state['root_dir'], 'rust', 'hgcli')
+
+    env = dict(os.environ)
+
+    # Tell cpython's build.rs to use our Python binary.
+    env['PYTHON_SYS_EXECUTABLE'] = os.path.join(
+        state['python_install_dir'], 'bin', 'python2.7')
+
+    # Tell our build.rs where to find libpython.
+    env['HG_STANDALONE_LINK_PATH'] = os.path.join(
+        state['python_install_dir'], 'lib')
+
+    subprocess.check_call(['cargo', 'build', '--release', '-v'],
+                          cwd=rust_dir, env=env)
+
+    subprocess.check_call([
+        'cargo',
+        'install',
+        '--force',
+        '--root', state['install_dir'],
+    ], cwd=rust_dir, env=env)
+
+    # TODO figure out how to link properly via Cargo.
+    # Adjust rpath so libpython is loaded from a relative path.
+    if sys.platform.startswith('linux'):
+        subprocess.check_call([
+            'patchelf',
+            '--set-rpath',
+            '$ORIGIN/../hgpython/lib',
+            state['hg_bin'],
+        ])
+    elif sys.platform == 'darwin':
+        subprocess.check_call([
+            'install_name_tool', '-change',
+            '/System/Library/Frameworks/Python.framework/Versions/2.7/Python',
+            '@loader_path/../lib/libpython2.7.dylib',
+            state['hg_bin'],
+        ])
+
+def install_mercurial(state):
+    """Install Mercurial files into the distribution."""
+    install_dir = os.path.join(state['install_dir'])
+    python = os.path.join(state['python_install_dir'], 'bin', 'python')
+
+    temp_dir = tempfile.mkdtemp(dir=state['build_dir'])
+    try:
+        subprocess.check_call([
+            python, 'setup.py',
+            'build',
+            'install',
+                # These are the only files we care about.
+                '--install-lib', os.path.join(install_dir, 'mercurial'),
+
+                '--install-data', os.path.join(temp_dir, 'data'),
+                '--install-headers', os.path.join(temp_dir, 'headers'),
+                '--install-platlib', os.path.join(temp_dir, 'platlib'),
+                '--install-purelib', os.path.join(temp_dir, 'purelib'),
+                # `hg` is replaced by our binary version.
+                '--install-scripts', os.path.join(temp_dir, 'bin'),
+            ],
+            cwd=state['root_dir'])
+    finally:
+        temp_files = set()
+        for root, dirs, files in os.walk(temp_dir):
+            for f in files:
+                full = os.path.join(root, f)
+                temp_files.add(full[len(temp_dir)+1:])
+
+        shutil.rmtree(temp_dir)
+
+        expected = {
+            'bin/hg',
+        }
+        extra = temp_files - expected
+        if extra:
+            raise Exception('unknown extra files were installed: %s' %
+                            ', '.join(sorted(extra)))
+
+
+def _run_hg(args):
+    env = dict(os.environ)
+    env['HGPLAIN'] = '1'
+    env['HGRCPATH'] = ''
+
+    with open(os.devnull, 'wb') as devnull:
+        return subprocess.check_output([state['hg_bin']] + args,
+                                       env=env,
+                                       stderr=devnull)
+
+def verify_hg(state):
+    print('running `hg version`')
+    try:
+        print(_run_hg(['version']))
+    except subprocess.CalledProcessError as e:
+        print('error invoking `hg version`')
+        print(e.output)
+        sys.exit(1)
+
+
+def get_revision_info(state):
+    res = _run_hg(['-R', state['root_dir'], 'log', '-r', '.', '-T', '{node} {date}'])
+    node, date = res.split(' ')
+    return node, int(float(date))
+
+
+def _get_archive_files(state):
+    # Ideally we wouldn't have any ignores.
+    IGNORE = {
+        '.crates.toml',
+    }
+
+    for root, dirs, files in os.walk(state['install_dir']):
+        # sorts are here for determinism.
+        dirs.sort()
+        for f in sorted(files):
+            full = os.path.join(root, f)
+            rel = full[len(state['install_dir']) + 1:]
+
+            if rel in IGNORE:
+                continue
+
+            yield full, rel
+
+
+def create_tar(state, ts):
+    print('writing %s' % state['tar_path'])
+    with tarfile.TarFile(state['tar_path'], 'w') as tf:
+        for full, rel in _get_archive_files(state):
+            with open(full, 'rb') as fh:
+                ti = tf.gettarinfo(full, rel)
+
+                if ti.mode & (stat.S_ISUID | stat.S_ISGID):
+                    print('setuid or setgid bits set: %s' % full)
+
+                # Normalize mtime to commit time.
+                ti.mtime = ts
+                # Normalize uid/gid to root:root.
+                ti.uid = 0
+                ti.gid = 0
+                ti.uname = ''
+                ti.gname = ''
+
+                tf.addfile(ti, fh)
+
+    #gz = state['tar_path'] + '.gz'
+    #print('writing %s' % gz)
+    #with open(state['tar_path'], 'rb') as ifh, gzip.GzipFile(gz, 'wb') as ofh:
+    #    shutil.copyfileobj(ifh, ofh)
+
+
+if __name__ == '__main__':
+    root = os.path.normpath(os.path.join(os.path.dirname(__file__), '..'))
+    root = os.path.abspath(root)
+    build_dir = os.path.join(root, 'build')
+
+    python_install_dir = os.path.join(build_dir, 'standalone', 'hgpython')
+
+    state = {
+        'root_dir': root,
+        'build_dir': build_dir,
+        'install_dir': os.path.join(build_dir, 'standalone'),
+        'python_source_dir': os.path.join(build_dir, 'python-src'),
+        'python_build_dir': os.path.join(build_dir, 'python-build'),
+        'python_install_dir': python_install_dir,
+        'python_bin': os.path.join(python_install_dir, 'bin', 'python2.7'),
+        'hg_bin': os.path.join(build_dir, 'standalone', 'bin', 'hg'),
+        'tar_path': os.path.join(build_dir, 'standalone.tar'),
+    }
+
+    makedirs(state['install_dir'])
+    install_python(state)
+    install_rust_components(state)
+    install_mercurial(state)
+    verify_hg(state)
+    node, ts = get_revision_info(state)
+    create_tar(state, ts)
diff --git a/contrib/STANDALONE-MERCURIAL.rst b/contrib/STANDALONE-MERCURIAL.rst
new file mode 100644
--- /dev/null
+++ b/contrib/STANDALONE-MERCURIAL.rst
@@ -0,0 +1,70 @@ 
+====================
+Standalone Mercurial
+====================
+
+*Standalone Mercurial* is a generic term given to a distribution
+of Mercurial that is standalone and has minimal dependencies on
+the host (typically just the C runtime library). Instead, most of
+Mercurial's dependencies are included in the distribution. This
+includes a Python interpreter.
+
+Architecture
+============
+
+A standalone Mercurial distribution essentially consists of the
+following elements:
+
+* An `hg` binary executable
+* A Python interpreter shared library
+* The Python standard library
+* 3rd party Python packages to enhance the Mercurial experience
+* Mercurial's Python packages
+* Mercurial support files (help content, default config files, etc)
+* Any additional support files (e.g. shared library dependencies)
+
+From a high-level, the `hg` binary has a shared library dependency
+on `libpython`. The binary is configured to load the `libpython`
+that ships with the Mercurial distribution. When started, the
+`hg` binary assesses its state, configures an embedded Python
+interpreter, and essentially invoke Mercurial's `main()` function.
+
+Build Requirements
+==================
+
+Universal
+---------
+
+* Python 2.7 (to run the build script)
+* A working Rust and Cargo installation
+
+Linux
+-----
+
+* Dependencies to build Python 2.7 from source (GNU make, autoconf,
+  various dependencies for extensions)
+* The `patchelf` tool
+
+MacOS
+-----
+
+* Xcode
+
+Windows
+-------
+
+* Microsoft Visual C+ Compiler for Python 2.7 (https://www.microsoft.com/en-us/download/details.aspx?id=44266)
+
+Building
+========
+
+To build standalone Mercurial, run the following::
+
+   $ python2.7 contrib/build-standalone.py
+
+This will:
+
+1. Obtain a Python distribution (either by compiling from source
+   or downloading a pre-built distribution)
+2. Build Mercurial Rust components
+3. Build Mercurial Python components
+4. Produce an *archive* suitable for distribution
diff --git a/.hgignore b/.hgignore
--- a/.hgignore
+++ b/.hgignore
@@ -66,3 +66,5 @@ 
 # hackable windows distribution additions
 ^hg-python
 ^hg.py$
+
+subinclude:rust/.hgignore