Patchwork D7922: rust-matchers: add function to generate a regex matcher function

login
register
mail settings
Submitter phabricator
Date Jan. 17, 2020, 10:42 a.m.
Message ID <differential-rev-PHID-DREV-rdnmdm3apa5nq4fdwhab-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/44471/
State New
Headers show

Comments

phabricator - Jan. 17, 2020, 10:42 a.m.
Alphare created this revision.
Herald added subscribers: mercurial-devel, kevincox, durin42.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  This function will be used to help build the upcoming `IncludeMatcher`. While
  Re2 is still used and behind a feature flag, this function returns an error
  meant for fallback in the default case.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D7922

AFFECTED FILES
  rust/hg-core/src/lib.rs
  rust/hg-core/src/matchers.rs

CHANGE DETAILS




To: Alphare, #hg-reviewers
Cc: durin42, kevincox, mercurial-devel
phabricator - Feb. 7, 2020, 9:47 p.m.
pulkit added a comment.


  This one failed to apply on latest default, needs rebase.

REPOSITORY
  rHG Mercurial

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7922/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7922

To: Alphare, #hg-reviewers, pulkit
Cc: durin42, kevincox, mercurial-devel
phabricator - Feb. 11, 2020, 12:48 a.m.
This revision now requires changes to proceed.
martinvonz added inline comments.
martinvonz requested changes to this revision.

INLINE COMMENTS

> matchers.rs:224
> +
> +const MAX_RE_SIZE: usize = 20000;
> +

Hmm, I don't like to replicate this into Rust. I argued for a long time with Boris over a year ago that we should see if we can remove it from Python. He said they (Octobus, I think) would look into that if I would just queue the workaround for the time being. Could you see if you can simplify the Python code first instead?

See https://patchwork.mercurial-scm.org/patch/36755/ for discussion.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7922/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7922

To: Alphare, #hg-reviewers, pulkit, martinvonz
Cc: martinvonz, durin42, kevincox, mercurial-devel
phabricator - Feb. 11, 2020, 10:09 a.m.
Alphare added inline comments.

INLINE COMMENTS

> martinvonz wrote in matchers.rs:224
> Hmm, I don't like to replicate this into Rust. I argued for a long time with Boris over a year ago that we should see if we can remove it from Python. He said they (Octobus, I think) would look into that if I would just queue the workaround for the time being. Could you see if you can simplify the Python code first instead?
> 
> See https://patchwork.mercurial-scm.org/patch/36755/ for discussion.

I am having a little trouble reading the patchwork thread, but I gather that you want to get rid of the preventive splitting of patterns and just let the regex engine handle it on its own? Was this measure taken because of a bug in Python's `re` or because its exceptions were too coarse/unusable?
I'll look into the behavior of Re2, in that case.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7922/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7922

To: Alphare, #hg-reviewers, pulkit, martinvonz
Cc: martinvonz, durin42, kevincox, mercurial-devel
phabricator - Feb. 11, 2020, 10:58 a.m.
Alphare added inline comments.

INLINE COMMENTS

> Alphare wrote in matchers.rs:224
> I am having a little trouble reading the patchwork thread, but I gather that you want to get rid of the preventive splitting of patterns and just let the regex engine handle it on its own? Was this measure taken because of a bug in Python's `re` or because its exceptions were too coarse/unusable?
> I'll look into the behavior of Re2, in that case.

I've looked into the behavior of Re2. It will return an error if the DFA runs out of memory, which seems perfectly reasonable. 
I will simplify the Rust code, however I feel like you're better suited than I am to fix the Python side of things, since I don't really understand the ins-and-outs of the problem.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7922/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7922

To: Alphare, #hg-reviewers, pulkit, martinvonz
Cc: martinvonz, durin42, kevincox, mercurial-devel

Patch

diff --git a/rust/hg-core/src/matchers.rs b/rust/hg-core/src/matchers.rs
--- a/rust/hg-core/src/matchers.rs
+++ b/rust/hg-core/src/matchers.rs
@@ -7,7 +7,12 @@ 
 
 //! Structs and types for matching files and directories.
 
-use crate::{utils::hg_path::HgPath, DirsMultiset, DirstateMapError};
+#[cfg(feature = "with-re2")]
+use crate::re2::Re2;
+use crate::{
+    filepatterns::PatternResult, utils::hg_path::HgPath, DirsMultiset,
+    DirstateMapError, PatternError,
+};
 use std::collections::HashSet;
 use std::iter::FromIterator;
 use std::ops::Deref;
@@ -215,6 +220,28 @@ 
         true
     }
 }
+
+const MAX_RE_SIZE: usize = 20000;
+
+#[cfg(feature = "with-re2")]
+/// Returns a function that matches an `HgPath` against the given regex
+/// pattern.
+///
+/// This can fail when the pattern is invalid or not supported by the
+/// underlying engine `Re2`, for instance anything with back-references.
+fn re_matcher(
+    pattern: &[u8],
+) -> PatternResult<impl Fn(&HgPath) -> bool + Sync> {
+    let regex = Re2::new(pattern);
+    let regex = regex.map_err(|e| PatternError::UnsupportedSyntax(e))?;
+    Ok(move |path: &HgPath| regex.is_match(path.as_bytes()))
+}
+
+#[cfg(not(feature = "with-re2"))]
+fn re_matcher(_: &[u8]) -> PatternResult<Box<dyn Fn(&HgPath) -> bool + Sync>> {
+    Err(PatternError::Re2NotInstalled)
+}
+
 #[cfg(test)]
 mod tests {
     use super::*;
diff --git a/rust/hg-core/src/lib.rs b/rust/hg-core/src/lib.rs
--- a/rust/hg-core/src/lib.rs
+++ b/rust/hg-core/src/lib.rs
@@ -126,6 +126,9 @@ 
     /// Needed a pattern that can be turned into a regex but got one that can't
     /// This should only happen through programmer error.
     NonRegexPattern(IgnorePattern),
+    /// This is temporary, see `re2/mod.rs`.
+    /// This will cause a fallback to Python.
+    Re2NotInstalled,
 }
 
 impl ToString for PatternError {
@@ -148,6 +151,10 @@ 
             PatternError::NonRegexPattern(pattern) => {
                 format!("'{:?}' cannot be turned into a regex", pattern)
             }
+            PatternError::Re2NotInstalled => {
+                "Re2 is not installed, cannot use regex functionality."
+                    .to_string()
+            }
         }
     }
 }