Patchwork D8215: rust-status: move to recursive traversal to prepare for parallel traversal

login
register
mail settings
Submitter phabricator
Date March 4, 2020, 3:29 p.m.
Message ID <differential-rev-PHID-DREV-vugrw6i5tdutrh6gsdqi-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/45463/
State Superseded
Headers show

Comments

phabricator - March 4, 2020, 3:29 p.m.
Alphare created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  I have looked into traversing the working directory in parallel either
  by a recursive or an iterative algorithm. The recursive approach won quite
  decisively both in terms of performance and code readability.
  
  You can look at my experiment here:
  https://heptapod.octobus.net/Alphare/rayon-recursive-traversal
  
  The chance of a stack overflow happening because the directories get too nested
  seems slim.
  
  This change does not yet do anything in parallel.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D8215

AFFECTED FILES
  rust/hg-core/src/dirstate/status.rs

CHANGE DETAILS




To: Alphare, #hg-reviewers
Cc: mercurial-devel
phabricator - March 4, 2020, 9:30 p.m.
durin42 added a comment.


  I'm broadly happy with this series, but:
  
  1. I get test-check-rust-format.t errors
  2. I don't seem to be having any luck running the tests. It looks like the rust module policy doesn't cause the right things to get installed? In particular, `HGWITHRUSTEXT=cpython HGMODULEPOLICY=rust+c make clean tests` flunks `test-remotefilelog-datapack.py` because it can't import parsers.so? And it looks like `test-dirstate-race.t` is just broken by this series?
  3. The re2-missing fallback message breaks tests for users that want rust things, but lack re2. This isn't going to work, I'm afraid.
  
  The series is so big it's not practical for me to figure out which revision(s) I should be commenting on, so hopefully that's actionable feedback...

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8215/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8215

To: Alphare, #hg-reviewers
Cc: durin42, mercurial-devel
phabricator - March 5, 2020, 8:43 a.m.
Alphare added a comment.
Alphare added a subscriber: marmoute.


  In D8215#122325 <https://phab.mercurial-scm.org/D8215#122325>, @durin42 wrote:
  
  > 1. I get test-check-rust-format.t errors
  
  Indeed. It appears that none of the machines I use to run the entire test suite (so... not my laptop) have rustfmt installed, which skips this test. I have updated the series to fix this issue, and will install rustfmt where needed.
  
  > 2. I don't seem to be having any luck running the tests. It looks like the rust module policy doesn't cause the right things to get installed? In particular, `HGWITHRUSTEXT=cpython HGMODULEPOLICY=rust+c make clean tests` flunks `test-remotefilelog-datapack.py` because it can't import parsers.so? And it looks like `test-dirstate-race.t` is just broken by this series?
  
  I will look into this, this is bad.
  
  > 3. The re2-missing fallback message breaks tests for users that want rust things, but lack re2. This isn't going to work, I'm afraid.
  
  @marmoute suggested I do something in `debuginstall` to check for `re2` on the Rust side instead, I think it's a good idea.
  
  > The series is so big it's not practical for me to figure out which revision(s) I should be commenting on, so hopefully that's actionable feedback...
  
  It is, thanks!

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8215/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8215

To: Alphare, #hg-reviewers
Cc: marmoute, durin42, mercurial-devel
phabricator - March 6, 2020, 8:48 a.m.
Alphare added a comment.


  In D8215#122325 <https://phab.mercurial-scm.org/D8215#122325>, @durin42 wrote:
  
  > 
  
  And it looks like `test-dirstate-race.t` is just broken by this series?
  
  I can't reproduce that however. Do you have a particular way of doing so? Because it's a race test... it might be flaky.
  
  I still don't have a clear solution for the modulepolicy thing, because everything is broken if we use strict policies.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8215/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8215

To: Alphare, #hg-reviewers
Cc: marmoute, durin42, mercurial-devel
phabricator - March 6, 2020, 9:26 a.m.
Alphare added a comment.


  In D8215#122642 <https://phab.mercurial-scm.org/D8215#122642>, @Alphare wrote:
  
  > I still don't have a clear solution for the modulepolicy thing, because everything is broken if we use strict policies.
  
  It appears that I got confused by an unrelated issue. @marmoute pushed a fix for this as D8245 <https://phab.mercurial-scm.org/D8245>.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8215/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8215

To: Alphare, #hg-reviewers
Cc: marmoute, durin42, mercurial-devel

Patch

diff --git a/rust/hg-core/src/dirstate/status.rs b/rust/hg-core/src/dirstate/status.rs
--- a/rust/hg-core/src/dirstate/status.rs
+++ b/rust/hg-core/src/dirstate/status.rs
@@ -26,7 +26,6 @@ 
 };
 use lazy_static::lazy_static;
 use rayon::prelude::*;
-use std::collections::VecDeque;
 use std::{
     borrow::Cow,
     collections::HashSet,
@@ -299,48 +298,51 @@ 
     pub list_ignored: bool,
 }
 
-/// Dispatch a single file found during `traverse`.
-/// If `file` is a folder that needs to be traversed, it will be pushed into
+/// Dispatch a single entry (file, folder, symlink...) found during `traverse`.
+/// If the entry is a folder that needs to be traversed, it will be pushed into
 /// `work`.
-fn traverse_worker<'a>(
-    work: &mut VecDeque<HgPathBuf>,
-    matcher: &impl Matcher,
+fn handle_traversed_entry<'a>(
+    dir_entry: &DirEntry,
+    matcher: &(impl Matcher + Sync),
+    root_dir: impl AsRef<Path>,
     dmap: &DirstateMap,
     filename: impl AsRef<HgPath>,
-    dir_entry: &DirEntry,
-    ignore_fn: &impl for<'r> Fn(&'r HgPath) -> bool,
+    old_results: &FastHashMap<Cow<'a, HgPath>, Dispatch>,
+    ignore_fn: &(impl for<'r> Fn(&'r HgPath) -> bool + Sync),
     options: StatusOptions,
-) -> Option<IoResult<(Cow<'a, HgPath>, Dispatch)>> {
-    let file_type = match dir_entry.file_type() {
-        Ok(x) => x,
-        Err(e) => return Some(Err(e.into())),
-    };
+) -> IoResult<Vec<(Cow<'a, HgPath>, Dispatch)>> {
+    let file_type = dir_entry.file_type()?;
     let filename = filename.as_ref();
     let entry_option = dmap.get(filename);
 
     if file_type.is_dir() {
         // Do we need to traverse it?
         if !ignore_fn(&filename) || options.list_ignored {
-            work.push_front(filename.to_owned());
+            return traverse_dir(
+                matcher,
+                root_dir,
+                dmap,
+                filename.to_owned(),
+                &old_results,
+                ignore_fn,
+                options,
+            );
         }
         // Nested `if` until `rust-lang/rust#53668` is stable
         if let Some(entry) = entry_option {
             // Used to be a file, is now a folder
             if matcher.matches_everything() || matcher.matches(&filename) {
-                return Some(Ok((
+                return Ok(vec![(
                     Cow::Owned(filename.to_owned()),
                     dispatch_missing(entry.state),
-                )));
+                )]);
             }
         }
     } else if file_type.is_file() || file_type.is_symlink() {
         if let Some(entry) = entry_option {
             if matcher.matches_everything() || matcher.matches(&filename) {
-                let metadata = match dir_entry.metadata() {
-                    Ok(x) => x,
-                    Err(e) => return Some(Err(e.into())),
-                };
-                return Some(Ok((
+                let metadata = dir_entry.metadata()?;
+                return Ok(vec![(
                     Cow::Owned(filename.to_owned()),
                     dispatch_found(
                         &filename,
@@ -349,31 +351,31 @@ 
                         &dmap.copy_map,
                         options,
                     ),
-                )));
+                )]);
             }
         } else if (matcher.matches_everything() || matcher.matches(&filename))
             && !ignore_fn(&filename)
         {
-            return Some(Ok((
+            return Ok(vec![(
                 Cow::Owned(filename.to_owned()),
                 Dispatch::Unknown,
-            )));
+            )]);
         } else if ignore_fn(&filename) {
-            return Some(Ok((
+            return Ok(vec![(
                 Cow::Owned(filename.to_owned()),
                 Dispatch::Ignored,
-            )));
+            )]);
         }
     } else if let Some(entry) = entry_option {
         // Used to be a file or a folder, now something else.
         if matcher.matches_everything() || matcher.matches(&filename) {
-            return Some(Ok((
+            return Ok(vec![(
                 Cow::Owned(filename.to_owned()),
                 dispatch_missing(entry.state),
-            )));
+            )]);
         }
     }
-    None
+    return Ok(vec![]);
 }
 
 /// Walk the working directory recursively to look for changes compared to the
@@ -388,73 +390,96 @@ 
     options: StatusOptions,
 ) -> IoResult<FastHashMap<Cow<'a, HgPath>, Dispatch>> {
     let root_dir = root_dir.as_ref();
-    let mut new_results = FastHashMap::default();
 
-    let mut work = VecDeque::new();
-    work.push_front(path.as_ref().to_owned());
-
-    while let Some(ref directory) = work.pop_front() {
-        if directory.as_bytes() == b".hg" {
-            continue;
-        }
-        let visit_entries = match matcher.visit_children_set(directory) {
-            VisitChildrenSet::Empty => continue,
-            VisitChildrenSet::This | VisitChildrenSet::Recursive => None,
-            VisitChildrenSet::Set(set) => Some(set),
-        };
-        let buf = hg_path_to_path_buf(directory)?;
-        let dir_path = root_dir.join(buf);
-
-        let skip_dot_hg = !directory.as_bytes().is_empty();
-        let entries = match list_directory(dir_path, skip_dot_hg) {
-            Err(e) => match e.kind() {
-                ErrorKind::NotFound | ErrorKind::PermissionDenied => {
-                    new_results.insert(
-                        Cow::Owned(directory.to_owned()),
-                        Dispatch::Bad(BadMatch::OsError(
-                            // Unwrapping here is OK because the error always
-                            // is a real os error
-                            e.raw_os_error().unwrap(),
-                        )),
-                    );
-                    continue;
-                }
-                _ => return Err(e),
-            },
-            Ok(entries) => entries,
-        };
-
-        for (filename, dir_entry) in entries {
-            if let Some(ref set) = visit_entries {
-                if !set.contains(filename.deref()) {
-                    continue;
-                }
-            }
-            // TODO normalize
-            let filename = if directory.is_empty() {
-                filename.to_owned()
-            } else {
-                directory.join(&filename)
-            };
-
-            if !old_results.contains_key(filename.deref()) {
-                if let Some((res, dispatch)) = traverse_worker(
-                    &mut work, matcher, &dmap, &filename, &dir_entry,
-                    &ignore_fn, options,
-                )
-                .transpose()?
-                {
-                    new_results.insert(res, dispatch);
-                }
-            }
-        }
-    }
+    let mut new_results: FastHashMap<Cow<'a, HgPath>, Dispatch> =
+        traverse_dir(
+            matcher,
+            root_dir,
+            &dmap,
+            path,
+            &old_results,
+            &ignore_fn,
+            options,
+        )?
+        .into_par_iter()
+        .collect();
 
     new_results.extend(old_results.into_iter());
 
     Ok(new_results)
 }
 
+/// Decides whether the directory needs to be listed, and if so dispatches its
+/// entries
+fn traverse_dir<'a>(
+    matcher: &(impl Matcher + Sync),
+    root_dir: impl AsRef<Path>,
+    dmap: &DirstateMap,
+    path: impl AsRef<HgPath>,
+    old_results: &FastHashMap<Cow<'a, HgPath>, Dispatch>,
+    ignore_fn: &(impl for<'r> Fn(&'r HgPath) -> bool + Sync),
+    options: StatusOptions,
+) -> IoResult<Vec<(Cow<'a, HgPath>, Dispatch)>> {
+    let directory = path.as_ref();
+    if directory.as_bytes() == b".hg" {
+        return Ok(vec![]);
+    }
+    let visit_entries = match matcher.visit_children_set(directory) {
+        VisitChildrenSet::Empty => return Ok(vec![]),
+        VisitChildrenSet::This | VisitChildrenSet::Recursive => None,
+        VisitChildrenSet::Set(set) => Some(set),
+    };
+    let buf = hg_path_to_path_buf(directory)?;
+    let dir_path = root_dir.as_ref().join(buf);
+
+    let skip_dot_hg = !directory.as_bytes().is_empty();
+    let entries = match list_directory(dir_path, skip_dot_hg) {
+        Err(e) => match e.kind() {
+            ErrorKind::NotFound | ErrorKind::PermissionDenied => {
+                return Ok(vec![(
+                    Cow::Owned(directory.to_owned()),
+                    Dispatch::Bad(BadMatch::OsError(
+                        // Unwrapping here is OK because the error always
+                        // is a real os error
+                        e.raw_os_error().unwrap(),
+                    )),
+                )]);
+            }
+            _ => return Err(e),
+        },
+        Ok(entries) => entries,
+    };
+
+    let mut new_results = vec![];
+    for (filename, dir_entry) in entries {
+        if let Some(ref set) = visit_entries {
+            if !set.contains(filename.deref()) {
+                continue;
+            }
+        }
+        // TODO normalize
+        let filename = if directory.is_empty() {
+            filename.to_owned()
+        } else {
+            directory.join(&filename)
+        };
+
+        if !old_results.contains_key(filename.deref()) {
+            new_results.extend(handle_traversed_entry(
+                &dir_entry,
+                matcher,
+                root_dir.as_ref(),
+                &dmap,
+                &filename,
+                old_results,
+                ignore_fn,
+                options,
+            )?);
+        }
+    }
+    Ok(new_results)
+}
+
 /// Stat all entries in the `DirstateMap` and mark them for dispatch.
 fn stat_dmap_entries(
     dmap: &DirstateMap,