Patchwork D7871: rust-utils: add util for canonical path

login
register
mail settings
Submitter phabricator
Date Jan. 14, 2020, 5:35 p.m.
Message ID <differential-rev-PHID-DREV-du2y5nvrvr43bahbwaia-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/44337/
State Superseded
Headers show

Comments

phabricator - Jan. 14, 2020, 5:35 p.m.
Alphare created this revision.
Herald added subscribers: mercurial-devel, kevincox, durin42.
Herald added a reviewer: hg-reviewers.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D7871

AFFECTED FILES
  rust/Cargo.lock
  rust/hg-core/Cargo.toml
  rust/hg-core/src/utils/files.rs

CHANGE DETAILS




To: Alphare, #hg-reviewers
Cc: durin42, kevincox, mercurial-devel
phabricator - Feb. 6, 2020, 5:39 p.m.
This revision now requires changes to proceed.
martinvonz added inline comments.
martinvonz requested changes to this revision.

INLINE COMMENTS

> files.rs:207-216
> +    let name = if !name.is_absolute() {
> +        root.join(&cwd).join(&name)
> +    } else {
> +        name.to_owned()
> +    };
> +    let mut auditor = PathAuditor::new(&root);
> +    if name != root && name.starts_with(&root) {

Might be worth having a fast path for when `name` is not absolute (including "") and neither `name` nor `cwd` contains "../"so we don't do `let name = root.join(&cwd).join(&name)` immediately followed by `let name = name.strip_prefix(&root).unwrap()`. Especially since that seems like the common case. Or can that ever produce different results? Anyway, that can be done later.

> files.rs:220-222
> +        // Determine whether `name' is in the hierarchy at or beneath `root',
> +        // by iterating name=name.parent() until that causes no change (can't
> +        // check name == '/', because that doesn't work on windows).

Is this accurate? It looks like we break when `name.parent() == None`.

> files.rs:227-230
> +                if name.components().next().is_none() {
> +                    // `name` was actually the same as root (maybe a symlink)
> +                    return Ok("".into());
> +                }

`name.components().next()` will be `None` only if `name` is something like "/", right? But then we shouldn't return "" (a repo-relative path), we should error out.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7871/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7871

To: Alphare, #hg-reviewers, kevincox, martinvonz
Cc: martinvonz, durin42, kevincox, mercurial-devel
phabricator - Feb. 6, 2020, 9:21 p.m.
Alphare added inline comments.

INLINE COMMENTS

> martinvonz wrote in files.rs:207-216
> Might be worth having a fast path for when `name` is not absolute (including "") and neither `name` nor `cwd` contains "../"so we don't do `let name = root.join(&cwd).join(&name)` immediately followed by `let name = name.strip_prefix(&root).unwrap()`. Especially since that seems like the common case. Or can that ever produce different results? Anyway, that can be done later.

That might be worth investigating indeed. I will defer to a later time, though. :)

> martinvonz wrote in files.rs:220-222
> Is this accurate? It looks like we break when `name.parent() == None`.

I could reword this sentence if you feel that it does not convey the same meaning. It fits the Python implementation better, that's for sure.

> martinvonz wrote in files.rs:227-230
> `name.components().next()` will be `None` only if `name` is something like "/", right? But then we shouldn't return "" (a repo-relative path), we should error out.

For `b'/'` it would return `Some(std::Path::Components::RootDir)`. It only returns `None` if there are no components.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7871/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7871

To: Alphare, #hg-reviewers, kevincox, martinvonz
Cc: martinvonz, durin42, kevincox, mercurial-devel
phabricator - Feb. 6, 2020, 9:28 p.m.
martinvonz added inline comments.

INLINE COMMENTS

> Alphare wrote in files.rs:220-222
> I could reword this sentence if you feel that it does not convey the same meaning. It fits the Python implementation better, that's for sure.

To me, the current sentence very much makes it sound like it iterates until `name != name.parent()`, which doesn't seem correct. So, yes, please reword it. (Unless I'm just misunderstanding something.)

> Alphare wrote in files.rs:227-230
> For `b'/'` it would return `Some(std::Path::Components::RootDir)`. It only returns `None` if there are no components.

Okay, but it seems that that doesn't address my concern. If I understand correctly (now that you've corrected me), we get here if `name == ""`. However, name is an absolute path (right?), so with `root = "/some/path/"` and `name = ""`, why should we return `Ok()`? If `name` is indeed an absolute path, I would interpret "" as the root (file system root, not repo root), which is not inside "/some/path". What am I missing?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7871/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7871

To: Alphare, #hg-reviewers, kevincox, martinvonz
Cc: martinvonz, durin42, kevincox, mercurial-devel
phabricator - Feb. 6, 2020, 10:20 p.m.
martinvonz added inline comments.

INLINE COMMENTS

> files.rs:231
> +                }
> +                auditor.audit_path(path_to_hg_path_buf(name)?)?;
> +                return Ok(name.to_owned());

Related to the comment above, this seems to (generally) return an absolute parent directory instead of a repo-relative path. Try adding a `println!()` just inside the `if same` to make sure you have test coverage.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7871/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7871

To: Alphare, #hg-reviewers, kevincox, martinvonz
Cc: martinvonz, durin42, kevincox, mercurial-devel
phabricator - Feb. 10, 2020, 10:57 p.m.
This revision is now accepted and ready to land.
martinvonz added inline comments.
martinvonz accepted this revision.

INLINE COMMENTS

> files.rs:366
> +        // TODO make portable
> +        std::os::unix::fs::symlink(&root, &out_of_repo).unwrap();
> +

Can the test method be annotated so it's only run on unix?

REPOSITORY
  rHG Mercurial

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7871/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7871

To: Alphare, #hg-reviewers, kevincox, martinvonz
Cc: martinvonz, durin42, kevincox, mercurial-devel
phabricator - Feb. 10, 2020, 11:40 p.m.
Alphare added inline comments.

INLINE COMMENTS

> martinvonz wrote in files.rs:366
> Can the test method be annotated so it's only run on unix?

Yes, but I think we need to have a more general chat about this subject. A non-negligible number of `hg-core`'s features do not work under non-UNIX systems. Our module policy system does not have the fine-grained control needed to map for a more complexe set of `features` (in Cargo terms). I think the more reasonable option right now is to have a compilation failure. Maybe we should have a big gated `compile_error!` at the top of `lib.rs` to inform users that there is work to be done to be cross-platform?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7871/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7871

To: Alphare, #hg-reviewers, kevincox, martinvonz
Cc: martinvonz, durin42, kevincox, mercurial-devel

Patch

diff --git a/rust/hg-core/src/utils/files.rs b/rust/hg-core/src/utils/files.rs
--- a/rust/hg-core/src/utils/files.rs
+++ b/rust/hg-core/src/utils/files.rs
@@ -9,13 +9,17 @@ 
 
 //! Functions for fiddling with files.
 
-use crate::utils::hg_path::{HgPath, HgPathBuf};
-
+use crate::utils::hg_path::{
+    path_to_hg_path_buf, HgPath, HgPathBuf, HgPathError,
+};
+use crate::utils::path_auditor::PathAuditor;
 use crate::utils::replace_slice;
 use lazy_static::lazy_static;
 use std::fs::Metadata;
 use std::iter::FusedIterator;
-use std::path::Path;
+use std::ops::Deref;
+use same_file::is_same_file;
+use std::path::{Path, PathBuf};
 
 pub fn get_path_from_bytes(bytes: &[u8]) -> &Path {
     let os_str;
@@ -272,9 +276,62 @@ 
     }
 }
 
+/// Returns the canonical path of `name`, given `cwd` and `root`
+pub fn canonical_path(
+    root: impl AsRef<Path>,
+    cwd: impl AsRef<Path>,
+    name: impl AsRef<Path>,
+) -> Result<PathBuf, HgPathError> {
+    // TODO add missing normalization for other platforms
+    let root = root.as_ref();
+    let cwd = cwd.as_ref();
+    let name = name.as_ref();
+
+    let name = if !name.is_absolute() {
+        root.join(&cwd).join(&name)
+    } else {
+        name.to_owned()
+    };
+    let mut auditor = PathAuditor::new(&root);
+    if name != root && name.starts_with(&root) {
+        let name = name.strip_prefix(&root).unwrap();
+        auditor.audit_path(path_to_hg_path_buf(name)?)?;
+        return Ok(name.to_owned());
+    } else if name == root {
+        return Ok("".into());
+    } else {
+        // Determine whether `name' is in the hierarchy at or beneath `root',
+        // by iterating name=name.parent() until that causes no change (can't
+        // check name == '/', because that doesn't work on windows).
+        let mut name = name.deref();
+        loop {
+            let same = is_same_file(&name, &root).unwrap_or(false);
+            if same {
+                if name.components().next().is_none() {
+                    // `name` was actually the same as root (maybe a symlink)
+                    return Ok("".into());
+                }
+                auditor.audit_path(path_to_hg_path_buf(name)?)?;
+                return Ok(name.to_owned());
+            }
+            name = match name.parent() {
+                None => break,
+                Some(p) => p,
+            };
+        }
+        // TODO hint to the user about using --cwd
+        // Bubble up the responsibility to Python for now
+        Err(HgPathError::NotUnderRoot {
+            path: name.to_owned(),
+            root: root.to_owned(),
+        })
+    }
+}
+
 #[cfg(test)]
 mod tests {
     use super::*;
+    use pretty_assertions::assert_eq;
 
     #[test]
     fn find_dirs_some() {
@@ -415,4 +472,53 @@ 
         assert_eq!(dirs.next(), None);
         assert_eq!(dirs.next(), None);
     }
+
+    #[test]
+    fn test_canonical_path() {
+        let root = Path::new("/repo");
+        let cwd = Path::new("/dir");
+        let name = Path::new("filename");
+        assert_eq!(
+            canonical_path(root, cwd, name),
+            Err(HgPathError::NotUnderRoot {
+                path: PathBuf::from("/"),
+                root: root.to_path_buf()
+            })
+        );
+
+        let root = Path::new("/repo");
+        let cwd = Path::new("/");
+        let name = Path::new("filename");
+        assert_eq!(
+            canonical_path(root, cwd, name),
+            Err(HgPathError::NotUnderRoot {
+                path: PathBuf::from("/"),
+                root: root.to_path_buf()
+            })
+        );
+
+        let root = Path::new("/repo");
+        let cwd = Path::new("/");
+        let name = Path::new("repo/filename");
+        assert_eq!(
+            canonical_path(root, cwd, name),
+            Ok(PathBuf::from("filename"))
+        );
+
+        let root = Path::new("/repo");
+        let cwd = Path::new("/repo");
+        let name = Path::new("filename");
+        assert_eq!(
+            canonical_path(root, cwd, name),
+            Ok(PathBuf::from("filename"))
+        );
+
+        let root = Path::new("/repo");
+        let cwd = Path::new("/repo/subdir");
+        let name = Path::new("filename");
+        assert_eq!(
+            canonical_path(root, cwd, name),
+            Ok(PathBuf::from("subdir/filename"))
+        );
+    }
 }
diff --git a/rust/hg-core/Cargo.toml b/rust/hg-core/Cargo.toml
--- a/rust/hg-core/Cargo.toml
+++ b/rust/hg-core/Cargo.toml
@@ -17,7 +17,8 @@ 
 rayon = "1.2.0"
 regex = "1.1.0"
 twox-hash = "1.5.0"
+same-file = "1.0.6"
 
 [dev-dependencies]
 tempfile = "3.1.0"
 pretty_assertions = "0.6.1"
\ No newline at end of file
diff --git a/rust/Cargo.lock b/rust/Cargo.lock
--- a/rust/Cargo.lock
+++ b/rust/Cargo.lock
@@ -157,6 +157,7 @@ 
  "rand_pcg 0.1.2 (registry+https://github.com/rust-lang/crates.io-index)",
  "rayon 1.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
  "regex 1.3.1 (registry+https://github.com/rust-lang/crates.io-index)",
+ "same-file 1.0.6 (registry+https://github.com/rust-lang/crates.io-index)",
  "tempfile 3.1.0 (registry+https://github.com/rust-lang/crates.io-index)",
  "twox-hash 1.5.0 (registry+https://github.com/rust-lang/crates.io-index)",
 ]
@@ -481,6 +482,14 @@ 
 ]
 
 [[package]]
+name = "same-file"
+version = "1.0.6"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+dependencies = [
+ "winapi-util 0.1.3 (registry+https://github.com/rust-lang/crates.io-index)",
+]
+
+[[package]]
 name = "scopeguard"
 version = "1.0.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
@@ -562,6 +571,14 @@ 
 source = "registry+https://github.com/rust-lang/crates.io-index"
 
 [[package]]
+name = "winapi-util"
+version = "0.1.3"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+dependencies = [
+ "winapi 0.3.8 (registry+https://github.com/rust-lang/crates.io-index)",
+]
+
+[[package]]
 name = "winapi-x86_64-pc-windows-gnu"
 version = "0.4.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
@@ -622,6 +639,7 @@ 
 "checksum regex-syntax 0.6.12 (registry+https://github.com/rust-lang/crates.io-index)" = "11a7e20d1cce64ef2fed88b66d347f88bd9babb82845b2b858f3edbf59a4f716"
 "checksum remove_dir_all 0.5.2 (registry+https://github.com/rust-lang/crates.io-index)" = "4a83fa3702a688b9359eccba92d153ac33fd2e8462f9e0e3fdf155239ea7792e"
 "checksum rustc_version 0.2.3 (registry+https://github.com/rust-lang/crates.io-index)" = "138e3e0acb6c9fb258b19b67cb8abd63c00679d2851805ea151465464fe9030a"
+"checksum same-file 1.0.6 (registry+https://github.com/rust-lang/crates.io-index)" = "93fc1dc3aaa9bfed95e02e6eadabb4baf7e3078b0bd1b4d7b6b0b68378900502"
 "checksum scopeguard 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "b42e15e59b18a828bbf5c58ea01debb36b9b096346de35d941dcb89009f24a0d"
 "checksum semver 0.9.0 (registry+https://github.com/rust-lang/crates.io-index)" = "1d7eb9ef2c18661902cc47e535f9bc51b78acd254da71d375c2f6720d9a40403"
 "checksum semver-parser 0.7.0 (registry+https://github.com/rust-lang/crates.io-index)" = "388a1df253eca08550bef6c72392cfe7c30914bf41df5269b68cbd6ff8f570a3"
@@ -633,4 +651,5 @@ 
 "checksum wasi 0.7.0 (registry+https://github.com/rust-lang/crates.io-index)" = "b89c3ce4ce14bdc6fb6beaf9ec7928ca331de5df7e5ea278375642a2f478570d"
 "checksum winapi 0.3.8 (registry+https://github.com/rust-lang/crates.io-index)" = "8093091eeb260906a183e6ae1abdba2ef5ef2257a21801128899c3fc699229c6"
 "checksum winapi-i686-pc-windows-gnu 0.4.0 (registry+https://github.com/rust-lang/crates.io-index)" = "ac3b87c63620426dd9b991e5ce0329eff545bccbbb34f3be09ff6fb6ab51b7b6"
+"checksum winapi-util 0.1.3 (registry+https://github.com/rust-lang/crates.io-index)" = "4ccfbf554c6ad11084fb7517daca16cfdcaccbdadba4fc336f032a8b12c2ad80"
 "checksum winapi-x86_64-pc-windows-gnu 0.4.0 (registry+https://github.com/rust-lang/crates.io-index)" = "712e227841d057c1ee1cd2fb22fa7e5a5461ae8e48fa2ca79ec42cfc1931183f"