Patchwork D7927: rust-status: add util for listing a directory

login
register
mail settings
Submitter phabricator
Date Jan. 17, 2020, 3:33 p.m.
Message ID <differential-rev-PHID-DREV-tyufzub2it62rtwcvyt3-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/44488/
State New
Headers show

Comments

phabricator - Jan. 17, 2020, 3:33 p.m.
Alphare created this revision.
Herald added subscribers: mercurial-devel, kevincox, durin42.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  I debated moving it to utils, but it is not used anywhere else for now, and
  its skip behavior is pretty specific to status.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D7927

AFFECTED FILES
  rust/hg-core/src/dirstate/status.rs

CHANGE DETAILS




To: Alphare, #hg-reviewers
Cc: durin42, kevincox, mercurial-devel
phabricator - Feb. 5, 2020, 10:31 p.m.
marmoute added a comment.
marmoute accepted this revision.


  I am not a rust expert, but I don't see anything obviously wrong in there. I make a couple of minor comments.

INLINE COMMENTS

> status.rs:68
> +        if skip_dot_hg && filename.as_bytes() == b".hg" && file_type.is_dir() {
> +            return Ok(vec![]);
> +        } else {

note: took me some time to realize that skipping the full directory if there is a `.hg` make sense. Because we detected a another repository and that the `wc` is part of that repository.

Maybe follow up with a documentation update that clarify this.

> status.rs:74
> +
> +    results.sort_by(|a, b| a.0.cmp(&b.0));
> +    Ok(results)

Silly question: would it make sense to sort the filename before the loop ? It might result is less bytes to move around since the entry will not be attached yet.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7927/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7927

To: Alphare, #hg-reviewers, marmoute
Cc: marmoute, durin42, kevincox, mercurial-devel
phabricator - Feb. 6, 2020, 3:02 p.m.
Alphare added inline comments.
Alphare marked an inline comment as done.

INLINE COMMENTS

> marmoute wrote in status.rs:74
> Silly question: would it make sense to sort the filename before the loop ? It might result is less bytes to move around since the entry will not be attached yet.

I'm not entirely sure I understand your question.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7927/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7927

To: Alphare, #hg-reviewers, marmoute
Cc: marmoute, durin42, kevincox, mercurial-devel
phabricator - Feb. 6, 2020, 7:46 p.m.
marmoute added inline comments.

INLINE COMMENTS

> Alphare wrote in status.rs:74
> I'm not entirely sure I understand your question.

If I understand you code correctly you currently do:

  filenames = listdir()
  result = [(f, entry(f)) for f in filenames]
  result.sort()

Would it make sense to do:

  filenames = listdir()
  filenames.sort()
  result = [(f, entry(f)) for f in filenames]

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7927/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7927

To: Alphare, #hg-reviewers, marmoute
Cc: marmoute, durin42, kevincox, mercurial-devel
phabricator - Feb. 10, 2020, 4:11 p.m.
Alphare added inline comments.

INLINE COMMENTS

> marmoute wrote in status.rs:74
> If I understand you code correctly you currently do:
> 
>   filenames = listdir()
>   result = [(f, entry(f)) for f in filenames]
>   result.sort()
> 
> Would it make sense to do:
> 
>   filenames = listdir()
>   filenames.sort()
>   result = [(f, entry(f)) for f in filenames]

std::fs::read_dir <https://doc.rust-lang.org/nightly/std/fs/fn.read_dir.html> returns an iterator, so I would have to collect it first, unsure if that would be more expensive... and also, a `Vec` of heap-allocated elements being just an array of pointers I'm not convinced that would help. Maybe we can do some benchmarking further down the line.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7927/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7927

To: Alphare, #hg-reviewers, marmoute
Cc: marmoute, durin42, kevincox, mercurial-devel
phabricator - Feb. 12, 2020, 10:09 a.m.
kevincox added inline comments.
kevincox accepted this revision.

INLINE COMMENTS

> status.rs:76
> +
> +    results.sort_by(|a, b| a.0.cmp(&b.0));
> +    Ok(results)

It would be more clear to do `.sort_by_key(|e| e.0)`

And I don't think stability matters here so you could use `sort_unstable_by_key`.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7927/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7927

To: Alphare, #hg-reviewers, marmoute, kevincox
Cc: marmoute, durin42, kevincox, mercurial-devel
phabricator - Feb. 12, 2020, 2:02 p.m.
Alphare added inline comments.

INLINE COMMENTS

> kevincox wrote in status.rs:76
> It would be more clear to do `.sort_by_key(|e| e.0)`
> 
> And I don't think stability matters here so you could use `sort_unstable_by_key`.

I didn't realize that `sort_unstable_by_key` needed to return a `T` and not a `&T`, that forces a `clone()`. I'm not sure that matters, we'll go with the shortest code and see if that shows up in profiling later.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7927/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7927

To: Alphare, #hg-reviewers, marmoute, kevincox
Cc: marmoute, durin42, kevincox, mercurial-devel
phabricator - Feb. 12, 2020, 4:02 p.m.
kevincox added inline comments.

INLINE COMMENTS

> Alphare wrote in status.rs:76
> I didn't realize that `sort_unstable_by_key` needed to return a `T` and not a `&T`, that forces a `clone()`. I'm not sure that matters, we'll go with the shortest code and see if that shows up in profiling later.

`sort_unstable_by_key` doesn't require the function to return a T at all. It just needs to return an `Ord` type.  Note that `&Ord` is `Ord` automatically.

https://doc.rust-lang.org/std/primitive.slice.html#method.sort_unstable_by_key

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7927/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7927

To: Alphare, #hg-reviewers, marmoute, kevincox
Cc: marmoute, durin42, kevincox, mercurial-devel

Patch

diff --git a/rust/hg-core/src/dirstate/status.rs b/rust/hg-core/src/dirstate/status.rs
--- a/rust/hg-core/src/dirstate/status.rs
+++ b/rust/hg-core/src/dirstate/status.rs
@@ -14,12 +14,15 @@ 
     matchers::Matcher,
     utils::{
         files::HgMetadata,
-        hg_path::{hg_path_to_path_buf, HgPath},
+        hg_path::{
+            hg_path_to_path_buf, os_string_to_hg_path_buf, HgPath, HgPathBuf,
+        },
     },
     CopyMap, DirstateEntry, DirstateMap, EntryState,
 };
 use rayon::prelude::*;
 use std::collections::HashSet;
+use std::fs::{read_dir, DirEntry};
 use std::path::Path;
 
 /// Marker enum used to dispatch new status entries into the right collections.
@@ -48,6 +51,30 @@ 
     a & i32::max_value() != b & i32::max_value()
 }
 
+/// Return a sorted list containing information about the entries
+/// in the directory.
+fn list_directory(
+    path: impl AsRef<Path>,
+    skip_dot_hg: bool,
+) -> std::io::Result<Vec<(HgPathBuf, DirEntry)>> {
+    let mut results = vec![];
+    let entries = read_dir(path.as_ref())?;
+
+    for entry in entries {
+        let entry = entry?;
+        let filename = os_string_to_hg_path_buf(entry.file_name())?;
+        let file_type = entry.file_type()?;
+        if skip_dot_hg && filename.as_bytes() == b".hg" && file_type.is_dir() {
+            return Ok(vec![]);
+        } else {
+            results.push((HgPathBuf::from(filename), entry))
+        }
+    }
+
+    results.sort_by(|a, b| a.0.cmp(&b.0));
+    Ok(results)
+}
+
 /// The file corresponding to the dirstate entry was found on the filesystem.
 fn dispatch_found(
     filename: impl AsRef<HgPath>,