Patchwork D7909: rust-filepatterns: add support for `ignore` and `subignore` patterns

login
register
mail settings
Submitter phabricator
Date Jan. 16, 2020, 11:33 a.m.
Message ID <differential-rev-PHID-DREV-xkbqk6xlntkrgqn4x62c-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/44429/
State Superseded
Headers show

Comments

phabricator - Jan. 16, 2020, 11:33 a.m.
Alphare created this revision.
Herald added subscribers: mercurial-devel, kevincox, durin42.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  This prepares a future patch for `IncludeMatcher` on the road to bare
  `hg status` support.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D7909

AFFECTED FILES
  rust/hg-core/src/filepatterns.rs
  rust/hg-core/src/lib.rs

CHANGE DETAILS




To: Alphare, #hg-reviewers
Cc: durin42, kevincox, mercurial-devel
phabricator - Jan. 16, 2020, 12:19 p.m.
Alphare added a comment.


  Note to reviewers: this stack is part of the larger series of getting a full Rust dirstate.status which is most of the performance hit in bare hg status. More patches are coming, but I figured I would send the patches as I go to help with review timing.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7909/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7909

To: Alphare, #hg-reviewers
Cc: durin42, kevincox, mercurial-devel
phabricator - Jan. 16, 2020, 3:30 p.m.
This revision now requires changes to proceed.
kevincox added inline comments.
kevincox requested changes to this revision.

INLINE COMMENTS

> filepatterns.rs:456
> +        .into_iter()
> +        .filter_map(|entry| -> Option<PatternResult<_>> {
> +            let IgnorePattern {

I don't see anywhere where this returns `None`. Should this just be `.map()` or `.flat_map()`?

> filepatterns.rs:462
> +            } = &entry;
> +            let mut res = vec![];
> +            match syntax {

I would write this as `Some(Ok(match syntax { .. }))` as the last expression of this map. Then instead of `res.extend()` and `res.push()` you just return the vecs.

> filepatterns.rs:525
> +                    p
> +                })
> +            })?,

How about:

  if !p.is_empty() {
    p.push(b'/');
  }
  Ok(p)

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7909/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7909

To: Alphare, #hg-reviewers, kevincox
Cc: durin42, kevincox, mercurial-devel
phabricator - Jan. 16, 2020, 6:21 p.m.
Alphare added inline comments.
Alphare marked 2 inline comments as done.

INLINE COMMENTS

> kevincox wrote in filepatterns.rs:462
> I would write this as `Some(Ok(match syntax { .. }))` as the last expression of this map. Then instead of `res.extend()` and `res.push()` you just return the vecs.

The `match` arms would be incompatible. Or maybe I'm missing something.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7909/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7909

To: Alphare, #hg-reviewers, kevincox
Cc: durin42, kevincox, mercurial-devel
phabricator - Jan. 16, 2020, 7:40 p.m.
kevincox added inline comments.

INLINE COMMENTS

> Alphare wrote in filepatterns.rs:462
> The `match` arms would be incompatible. Or maybe I'm missing something.

You would have to make sure you return a Vec on both arms. You will probably want `inner_pats.into_iter().collect()` on one branch and `vec![entry]` on the other branch.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7909/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7909

To: Alphare, #hg-reviewers, kevincox
Cc: durin42, kevincox, mercurial-devel

Patch

diff --git a/rust/hg-core/src/lib.rs b/rust/hg-core/src/lib.rs
--- a/rust/hg-core/src/lib.rs
+++ b/rust/hg-core/src/lib.rs
@@ -122,6 +122,9 @@ 
     UnsupportedSyntaxInFile(String, String, usize),
     TooLong(usize),
     IO(std::io::Error),
+    /// Needed a pattern that can be turned into a regex but got one that can't
+    /// This should only happen through programmer error.
+    NonRegexPattern(IgnorePattern),
 }
 
 impl ToString for PatternError {
@@ -141,6 +144,9 @@ 
             }
             PatternError::IO(e) => e.to_string(),
             PatternError::Path(e) => e.to_string(),
+            PatternError::NonRegexPattern(pattern) => {
+                format!("'{:?}' cannot be turned into a regex", pattern)
+            }
         }
     }
 }
diff --git a/rust/hg-core/src/filepatterns.rs b/rust/hg-core/src/filepatterns.rs
--- a/rust/hg-core/src/filepatterns.rs
+++ b/rust/hg-core/src/filepatterns.rs
@@ -7,11 +7,19 @@ 
 
 //! Handling of Mercurial-specific patterns.
 
-use crate::{utils::SliceExt, FastHashMap, PatternError};
+use crate::{
+    utils::{
+        files::{canonical_path, get_bytes_from_path, get_path_from_bytes},
+        hg_path::{path_to_hg_path_buf, HgPathBuf, HgPathError},
+        SliceExt,
+    },
+    FastHashMap, PatternError,
+};
 use lazy_static::lazy_static;
 use regex::bytes::{NoExpand, Regex};
 use std::fs::File;
 use std::io::Read;
+use std::ops::Deref;
 use std::path::{Path, PathBuf};
 use std::vec::Vec;
 
@@ -53,6 +61,10 @@ 
     /// A path relative to repository root, which is matched non-recursively
     /// (will not match subdirectories)
     RootFiles,
+    /// A file of patterns to read and include
+    Include,
+    /// A file of patterns to match against files under the same directory
+    SubInclude,
 }
 
 /// Transforms a glob pattern into a regex
@@ -145,6 +157,8 @@ 
         b"relre:" => Ok(PatternSyntax::RelRegexp),
         b"glob:" => Ok(PatternSyntax::Glob),
         b"rootglob:" => Ok(PatternSyntax::RootGlob),
+        b"include:" => Ok(PatternSyntax::Include),
+        b"subinclude:" => Ok(PatternSyntax::SubInclude),
         _ => Err(PatternError::UnsupportedSyntax(
             String::from_utf8_lossy(kind).to_string(),
         )),
@@ -198,6 +212,7 @@ 
         PatternSyntax::Glob | PatternSyntax::RootGlob => {
             [glob_to_re(pattern).as_slice(), GLOB_SUFFIX].concat()
         }
+        PatternSyntax::Include | PatternSyntax::SubInclude => unreachable!(),
     }
 }
 
@@ -259,6 +274,9 @@ 
         | PatternSyntax::Path
         | PatternSyntax::RelGlob
         | PatternSyntax::RootFiles => normalize_path_bytes(&entry.pattern),
+        PatternSyntax::Include | PatternSyntax::SubInclude => {
+            return Err(PatternError::NonRegexPattern(entry))
+        }
         _ => entry.pattern,
     };
     if entry.syntax == PatternSyntax::RootGlob
@@ -423,6 +441,119 @@ 
 
 pub type PatternResult<T> = Result<T, PatternError>;
 
+/// Wrapper for `read_pattern_file` that also recursively expands `include:`
+/// patterns.
+///
+/// `subinclude:` is not treated as a special pattern here: unraveling them
+/// needs to occur in the "ignore" phase.
+pub fn get_patterns_from_file(
+    pattern_file: impl AsRef<Path>,
+    root_dir: impl AsRef<Path>,
+) -> PatternResult<(Vec<IgnorePattern>, Vec<PatternFileWarning>)> {
+    let (patterns, mut warnings) = read_pattern_file(&pattern_file, true)?;
+    let patterns = patterns
+        .into_iter()
+        .filter_map(|entry| -> Option<PatternResult<_>> {
+            let IgnorePattern {
+                syntax,
+                pattern,
+                source: _,
+            } = &entry;
+            let mut res = vec![];
+            match syntax {
+                PatternSyntax::Include => {
+                    let inner_include =
+                        root_dir.as_ref().join(get_path_from_bytes(&pattern));
+                    let (inner_pats, inner_warnings) =
+                        match get_patterns_from_file(
+                            &inner_include,
+                            root_dir.as_ref(),
+                        ) {
+                            Ok(x) => x,
+                            Err(e) => return Some(Err(e.into())),
+                        };
+                    warnings.extend(inner_warnings);
+                    res.extend(inner_pats);
+                }
+                _ => res.push(entry),
+            };
+            Some(Ok(res))
+        })
+        .flatten()
+        .flatten()
+        .collect();
+
+    Ok((patterns, warnings))
+}
+
+/// Holds all the information needed to handle a `subinclude:` pattern.
+pub struct SubInclude {
+    /// Will be used for repository (hg) paths that start with this prefix.
+    /// It is relative to the current working directory, so comparing against
+    /// repository paths is painless.
+    pub prefix: HgPathBuf,
+    /// The file itself, containing the patterns
+    pub path: PathBuf,
+    /// Folder in the filesystem where this it applies
+    pub root: PathBuf,
+}
+
+impl SubInclude {
+    pub fn new(
+        root_dir: impl AsRef<Path>,
+        pattern: &[u8],
+        source: impl AsRef<Path>,
+    ) -> Result<SubInclude, HgPathError> {
+        let normalized_source =
+            normalize_path_bytes(&get_bytes_from_path(source));
+
+        let source_root = get_path_from_bytes(&normalized_source);
+        let source_root = source_root.parent().unwrap_or(source_root.deref());
+
+        let path = source_root.join(get_path_from_bytes(pattern));
+        let new_root = path.parent().unwrap_or(path.deref());
+
+        let prefix = canonical_path(&root_dir, &root_dir, new_root)?;
+
+        Ok(Self {
+            prefix: path_to_hg_path_buf(prefix).and_then(|mut p| {
+                Ok(if p.is_empty() {
+                    p
+                } else {
+                    p.push(b'/');
+                    p
+                })
+            })?,
+            path: path.to_owned(),
+            root: new_root.to_owned(),
+        })
+    }
+}
+
+/// Separate and pre-process subincludes from other patterns for the "ignore"
+/// phase.
+pub fn filter_subincludes(
+    ignore_patterns: &[IgnorePattern],
+    root_dir: impl AsRef<Path>,
+) -> Result<(Vec<SubInclude>, Vec<&IgnorePattern>), HgPathError> {
+    let mut subincludes = vec![];
+    let mut others = vec![];
+
+    for ignore_pattern in ignore_patterns.iter() {
+        let IgnorePattern {
+            syntax,
+            pattern,
+            source,
+        } = ignore_pattern;
+        if *syntax == PatternSyntax::SubInclude {
+            subincludes.push(SubInclude::new(&root_dir, pattern, &source)?);
+        } else {
+            others.push(ignore_pattern)
+        }
+    }
+    Ok((subincludes, others))
+}
+
 #[cfg(test)]
 mod tests {
     use super::*;