Patchwork D7914: rust-matchers: implement `visit_children_set` for `FileMatcher`

login
register
mail settings
Submitter phabricator
Date Jan. 16, 2020, 10:10 p.m.
Message ID <differential-rev-PHID-DREV-u3iz5rww54i5jrsksnr3-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/44459/
State Superseded
Headers show

Comments

phabricator - Jan. 16, 2020, 10:10 p.m.
Alphare created this revision.
Herald added subscribers: mercurial-devel, kevincox, durin42.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  As per the removed inline comment, this will become useful in a future patch
  in this series as the `IncludeMatcher` is introduced.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D7914

AFFECTED FILES
  rust/hg-core/src/matchers.rs

CHANGE DETAILS




To: Alphare, #hg-reviewers
Cc: durin42, kevincox, mercurial-devel
phabricator - Jan. 17, 2020, 6:35 a.m.
martinvonz added inline comments.

INLINE COMMENTS

> matchers.rs:166
>      }
>      fn visit_children_set(
>          &self,

This will often be called repeatedly, so isn't it better to calculate a map of each parent directory to its `VisitChildrenSet` value upfront (in `FilesMatcher::new()`)?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7914/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7914

To: Alphare, #hg-reviewers
Cc: martinvonz, durin42, kevincox, mercurial-devel
phabricator - Jan. 17, 2020, 9:17 a.m.
Alphare added inline comments.

INLINE COMMENTS

> martinvonz wrote in matchers.rs:166
> This will often be called repeatedly, so isn't it better to calculate a map of each parent directory to its `VisitChildrenSet` value upfront (in `FilesMatcher::new()`)?

Agreed. In much of this series there exist opportunities for caching/making things run in parallel, etc. With the freeze approaching really fast, I prefer to prioritize getting correct - albeit sub-optimal - code in rather than risking missing the deadline. 
My benchmarks of the entire series show an improvement in bare `hg status`  in all supported repositories (read: that don't have back-references in their `.hgignore patterns), and no measurable slowdown in others.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7914/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7914

To: Alphare, #hg-reviewers
Cc: martinvonz, durin42, kevincox, mercurial-devel
phabricator - Jan. 17, 2020, 4:08 p.m.
martinvonz added inline comments.

INLINE COMMENTS

> Alphare wrote in matchers.rs:166
> Agreed. In much of this series there exist opportunities for caching/making things run in parallel, etc. With the freeze approaching really fast, I prefer to prioritize getting correct - albeit sub-optimal - code in rather than risking missing the deadline. 
> My benchmarks of the entire series show an improvement in bare `hg status`  in all supported repositories (read: that don't have back-references in their `.hgignore patterns), and no measurable slowdown in others.

Yes, code quality is never urgent. The problem is that there isn't much incentive for people to clean it up after. So I'd really prefer to not queue this without the cleanup.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7914/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7914

To: Alphare, #hg-reviewers
Cc: martinvonz, durin42, kevincox, mercurial-devel
phabricator - Jan. 17, 2020, 4:51 p.m.
Alphare added inline comments.

INLINE COMMENTS

> martinvonz wrote in matchers.rs:166
> Yes, code quality is never urgent. The problem is that there isn't much incentive for people to clean it up after. So I'd really prefer to not queue this without the cleanup.

I don't feel like this is an issue of code quality. This is (probably) an issue of performance, and I have a good incentive for making this code go faster in the future. I don't think this is necessary for now since this is a minor optimization compared to the other ones that are pending.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7914/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7914

To: Alphare, #hg-reviewers
Cc: martinvonz, durin42, kevincox, mercurial-devel
phabricator - Jan. 17, 2020, 5:30 p.m.
martinvonz added inline comments.

INLINE COMMENTS

> Alphare wrote in matchers.rs:166
> I don't feel like this is an issue of code quality. This is (probably) an issue of performance, and I have a good incentive for making this code go faster in the future. I don't think this is necessary for now since this is a minor optimization compared to the other ones that are pending.

How about we just replace the implementation by `VisitChildrenSet::Recursive`then? That's easy to maintain and makes it functionally correct. It's usually fast enough too.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7914/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7914

To: Alphare, #hg-reviewers
Cc: martinvonz, durin42, kevincox, mercurial-devel
phabricator - Jan. 19, 2020, 6:29 p.m.
martinvonz added inline comments.

INLINE COMMENTS

> martinvonz wrote in matchers.rs:166
> How about we just replace the implementation by `VisitChildrenSet::Recursive`then? That's easy to maintain and makes it functionally correct. It's usually fast enough too.

Alphare pointed out to me out of band that the Python code already does the same, so I agree that it makes sense to just copy that for now. I'm sorry that I didn't think of checking that before. It would have been even better to fix the Python version first and then copy the good code to Rust, but I'm still fine with doing that later.

PS. I haven't actually reviewed this code. I'll do that next week (and Monday is a holiday for me).

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7914/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7914

To: Alphare, #hg-reviewers
Cc: martinvonz, durin42, kevincox, mercurial-devel
phabricator - Feb. 10, 2020, 4:06 p.m.
Alphare added a comment.


  @martinvonz Any update on this patch? 
  Note: I will get to your remarks on the first change of the stack very soon as well.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7914/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7914

To: Alphare, #hg-reviewers
Cc: martinvonz, durin42, kevincox, mercurial-devel

Patch

diff --git a/rust/hg-core/src/matchers.rs b/rust/hg-core/src/matchers.rs
--- a/rust/hg-core/src/matchers.rs
+++ b/rust/hg-core/src/matchers.rs
@@ -163,12 +163,48 @@ 
     }
     fn visit_children_set(
         &self,
-        _directory: impl AsRef<HgPath>,
+        directory: impl AsRef<HgPath>,
     ) -> VisitChildrenSet {
-        // TODO implement once we have `status.traverse`
-        // This is useless until unknown files are taken into account
-        // Which will not need to happen before the `IncludeMatcher`.
-        unimplemented!()
+        if self.files.is_empty() || !self.dirs.contains(&directory) {
+            return VisitChildrenSet::Empty;
+        }
+        let dirs_as_set = self.dirs.iter().map(|k| k.deref()).collect();
+
+        let mut candidates: HashSet<&HgPath> =
+            self.files.union(&dirs_as_set).map(|k| *k).collect();
+        candidates.remove(HgPath::new(b""));
+
+        if !directory.as_ref().is_empty() {
+            let directory = [directory.as_ref().as_bytes(), b"/"].concat();
+            candidates = candidates
+                .iter()
+                .filter_map(|c| {
+                    if c.as_bytes().starts_with(&directory) {
+                        Some(HgPath::new(&c.as_bytes()[directory.len()..]))
+                    } else {
+                        None
+                    }
+                })
+                .collect();
+        }
+
+        // `self.dirs` includes all of the directories, recursively, so if
+        // we're attempting to match 'foo/bar/baz.txt', it'll have '', 'foo',
+        // 'foo/bar' in it. Thus we can safely ignore a candidate that has a
+        // '/' in it, indicating it's for a subdir-of-a-subdir; the immediate
+        // subdir will be in there without a slash.
+        VisitChildrenSet::Set(
+            candidates
+                .iter()
+                .filter_map(|c| {
+                    if c.bytes().all(|b| *b != b'/') {
+                        Some(*c)
+                    } else {
+                        None
+                    }
+                })
+                .collect(),
+        )
     }
     fn matches_everything(&self) -> bool {
         false
@@ -177,3 +213,107 @@ 
         true
     }
 }
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use pretty_assertions::assert_eq;
+
+    #[test]
+    fn test_filematcher_visit_children_set() {
+        // Visitchildrenset
+        let files = vec![HgPath::new(b"dir/subdir/foo.txt")];
+        let matcher = FileMatcher::new(&files).unwrap();
+
+        let mut set = HashSet::new();
+        set.insert(HgPath::new(b"dir"));
+        assert_eq!(
+            matcher.visit_children_set(HgPath::new(b"")),
+            VisitChildrenSet::Set(set)
+        );
+
+        let mut set = HashSet::new();
+        set.insert(HgPath::new(b"subdir"));
+        assert_eq!(
+            matcher.visit_children_set(HgPath::new(b"dir")),
+            VisitChildrenSet::Set(set)
+        );
+
+        let mut set = HashSet::new();
+        set.insert(HgPath::new(b"foo.txt"));
+        assert_eq!(
+            matcher.visit_children_set(HgPath::new(b"dir/subdir")),
+            VisitChildrenSet::Set(set)
+        );
+
+        assert_eq!(
+            matcher.visit_children_set(HgPath::new(b"dir/subdir/x")),
+            VisitChildrenSet::Empty
+        );
+        assert_eq!(
+            matcher.visit_children_set(HgPath::new(b"dir/subdir/foo.txt")),
+            VisitChildrenSet::Empty
+        );
+        assert_eq!(
+            matcher.visit_children_set(HgPath::new(b"folder")),
+            VisitChildrenSet::Empty
+        );
+    }
+
+    #[test]
+    fn test_filematcher_visit_children_set_files_and_dirs() {
+        let files = vec![
+            HgPath::new(b"rootfile.txt"),
+            HgPath::new(b"a/file1.txt"),
+            HgPath::new(b"a/b/file2.txt"),
+            // No file in a/b/c
+            HgPath::new(b"a/b/c/d/file4.txt"),
+        ];
+        let matcher = FileMatcher::new(&files).unwrap();
+
+        let mut set = HashSet::new();
+        set.insert(HgPath::new(b"a"));
+        set.insert(HgPath::new(b"rootfile.txt"));
+        assert_eq!(
+            matcher.visit_children_set(HgPath::new(b"")),
+            VisitChildrenSet::Set(set)
+        );
+
+        let mut set = HashSet::new();
+        set.insert(HgPath::new(b"b"));
+        set.insert(HgPath::new(b"file1.txt"));
+        assert_eq!(
+            matcher.visit_children_set(HgPath::new(b"a")),
+            VisitChildrenSet::Set(set)
+        );
+
+        let mut set = HashSet::new();
+        set.insert(HgPath::new(b"c"));
+        set.insert(HgPath::new(b"file2.txt"));
+        assert_eq!(
+            matcher.visit_children_set(HgPath::new(b"a/b")),
+            VisitChildrenSet::Set(set)
+        );
+
+        let mut set = HashSet::new();
+        set.insert(HgPath::new(b"d"));
+        assert_eq!(
+            matcher.visit_children_set(HgPath::new(b"a/b/c")),
+            VisitChildrenSet::Set(set)
+        );
+        let mut set = HashSet::new();
+        set.insert(HgPath::new(b"file4.txt"));
+        assert_eq!(
+            matcher.visit_children_set(HgPath::new(b"a/b/c/d")),
+            VisitChildrenSet::Set(set)
+        );
+
+        assert_eq!(
+            matcher.visit_children_set(HgPath::new(b"a/b/c/d/e")),
+            VisitChildrenSet::Empty
+        );
+        assert_eq!(
+            matcher.visit_children_set(HgPath::new(b"folder")),
+            VisitChildrenSet::Empty
+        );
+    }
+}