Patchwork D7528: rust-matchers: add `FileMatcher` implementation

login
register
mail settings
Submitter phabricator
Date Nov. 29, 2019, 5:57 p.m.
Message ID <differential-rev-PHID-DREV-fft3gckyv37v2pkqtikr-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/43534/
State Superseded
Headers show

Comments

phabricator - Nov. 29, 2019, 5:57 p.m.
Alphare created this revision.
Herald added subscribers: mercurial-devel, kevincox, durin42.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Mercurial defines an `exactmatcher`, I find `FileMatcher` to be clearer, but
  am not opposed to using the old name.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D7528

AFFECTED FILES
  rust/hg-core/src/matchers.rs

CHANGE DETAILS




To: Alphare, #hg-reviewers
Cc: durin42, kevincox, mercurial-devel
phabricator - Dec. 10, 2019, 9:34 p.m.
martinvonz added inline comments.

INLINE COMMENTS

> matchers.rs:129-132
> +/// assert_eq!(true, matcher.matches(HgPath::new(b"a.txt")));
> +/// assert_eq!(false, matcher.matches(HgPath::new(b"b.txt")));
> +/// assert_eq!(false, matcher.matches(HgPath::new(b"main.c")));
> +/// assert_eq!(true, matcher.matches(HgPath::new(br"re:.*\.c$")));

Same comment as on another (already submitted) patch: the argument order feels backwards. Do you mind changing it?

> matchers.rs:136
> +pub struct FileMatcher<'a> {
> +    files: HashSet<&'a HgPath>,
> +    dirs: DirsMultiset,

It would be slightly simpler if the paths were owned. It doesn't seem too expensive to take ownership of the paths. Do we have any call sites in this series that we can look at?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7528/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7528

To: Alphare, #hg-reviewers
Cc: martinvonz, durin42, kevincox, mercurial-devel
phabricator - Dec. 11, 2019, 9:29 a.m.
Alphare added inline comments.

INLINE COMMENTS

> martinvonz wrote in matchers.rs:136
> It would be slightly simpler if the paths were owned. It doesn't seem too expensive to take ownership of the paths. Do we have any call sites in this series that we can look at?

There is a single call site for now in D7529 <https://phab.mercurial-scm.org/D7529>, it's really straightforward.

Automated tools like CI can produce matchers with a lot of files and it seems to me not worth changing existing no-allocation code to get rid of a few lifetime annotations. Can we see if it gets too unwieldy?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7528/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7528

To: Alphare, #hg-reviewers
Cc: martinvonz, durin42, kevincox, mercurial-devel

Patch

diff --git a/rust/hg-core/src/matchers.rs b/rust/hg-core/src/matchers.rs
--- a/rust/hg-core/src/matchers.rs
+++ b/rust/hg-core/src/matchers.rs
@@ -7,8 +7,11 @@ 
 
 //! Structs and types for matching files and directories.
 
-use crate::utils::hg_path::{HgPath, HgPathBuf};
+use crate::utils::hg_path::HgPath;
+use crate::DirsMultiset;
 use std::collections::HashSet;
+use std::iter::FromIterator;
+use std::ops::Deref;
 
 pub enum VisitChildrenSet<'a> {
     /// Don't visit anything
@@ -114,3 +117,92 @@ 
         false
     }
 }
+
+/// Matches the input files exactly. They are interpreted as paths, not
+/// patterns.
+///
+///```
+/// use hg::{ matchers::{Matcher, FileMatcher}, utils::hg_path::HgPath };
+///
+/// let files = [HgPath::new(b"a.txt"), HgPath::new(br"re:.*\.c$")];
+/// let matcher = FileMatcher::new(&files);
+///
+/// assert_eq!(true, matcher.matches(HgPath::new(b"a.txt")));
+/// assert_eq!(false, matcher.matches(HgPath::new(b"b.txt")));
+/// assert_eq!(false, matcher.matches(HgPath::new(b"main.c")));
+/// assert_eq!(true, matcher.matches(HgPath::new(br"re:.*\.c$")));
+/// ```
+#[derive(Debug)]
+pub struct FileMatcher<'a> {
+    files: HashSet<&'a HgPath>,
+    dirs: DirsMultiset,
+}
+
+impl<'a> FileMatcher<'a> {
+    pub fn new(files: &'a [impl AsRef<HgPath>]) -> Self {
+        Self {
+            files: HashSet::from_iter(files.iter().map(|f| f.as_ref())),
+            dirs: DirsMultiset::from_manifest(files),
+        }
+    }
+    fn inner_matches(&self, filename: impl AsRef<HgPath>) -> bool {
+        self.files.contains(filename.as_ref())
+    }
+}
+
+impl<'a> Matcher for FileMatcher<'a> {
+    fn file_set(&self) -> Option<&HashSet<&HgPath>> {
+        Some(&self.files)
+    }
+    fn exact_match(&self, filename: impl AsRef<HgPath>) -> bool {
+        self.inner_matches(filename)
+    }
+    fn matches(&self, filename: impl AsRef<HgPath>) -> bool {
+        self.inner_matches(filename)
+    }
+    fn visit_children_set(
+        &self,
+        directory: impl AsRef<HgPath>,
+    ) -> VisitChildrenSet {
+        if self.files.len() == 0 || !self.dirs.contains(directory.as_ref()) {
+            return VisitChildrenSet::Empty;
+        }
+        let minus = self
+            .dirs
+            .iter()
+            .filter_map(|k| {
+                if k == HgPath::new(b"") {
+                    None
+                } else {
+                    Some(k.deref())
+                }
+            })
+            .collect();
+        let candidates = self.files.union(&minus);
+
+        let candidates: Vec<&HgPath> = if directory.as_ref().len() == 0 {
+            candidates.map(|f| *f).collect()
+        } else {
+            let mut d = directory.as_ref().to_owned();
+            d.push(b'/');
+            candidates
+                .into_iter()
+                .filter_map(|c| c.relative_to(&d))
+                .collect()
+        };
+
+        let ret = candidates
+            .into_iter()
+            .filter_map(|c| if !c.contains(b'/') { Some(c) } else { None })
+            .collect();
+
+        VisitChildrenSet::Set(ret)
+    }
+    fn matches_everything(&self) -> bool {
+        false
+    }
+    fn is_exact(&self) -> bool {
+        true
+    }
+}
+