Patchwork D7922: rust-matchers: add function to generate a regex matcher function

login
register
mail settings
Submitter phabricator
Date Jan. 17, 2020, 10:42 a.m.
Message ID <differential-rev-PHID-DREV-rdnmdm3apa5nq4fdwhab-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/44471/
State Superseded
Headers show

Comments

phabricator - Jan. 17, 2020, 10:42 a.m.
Alphare created this revision.
Herald added subscribers: mercurial-devel, kevincox, durin42.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  This function will be used to help build the upcoming `IncludeMatcher`. While
  Re2 is still used and behind a feature flag, this function returns an error
  meant for fallback in the default case.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D7922

AFFECTED FILES
  rust/hg-core/src/lib.rs
  rust/hg-core/src/matchers.rs

CHANGE DETAILS




To: Alphare, #hg-reviewers
Cc: durin42, kevincox, mercurial-devel
phabricator - Feb. 7, 2020, 9:47 p.m.
pulkit added a comment.


  This one failed to apply on latest default, needs rebase.

REPOSITORY
  rHG Mercurial

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7922/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7922

To: Alphare, #hg-reviewers, pulkit
Cc: durin42, kevincox, mercurial-devel
phabricator - Feb. 11, 2020, 12:48 a.m.
This revision now requires changes to proceed.
martinvonz added inline comments.
martinvonz requested changes to this revision.

INLINE COMMENTS

> matchers.rs:224
> +
> +const MAX_RE_SIZE: usize = 20000;
> +

Hmm, I don't like to replicate this into Rust. I argued for a long time with Boris over a year ago that we should see if we can remove it from Python. He said they (Octobus, I think) would look into that if I would just queue the workaround for the time being. Could you see if you can simplify the Python code first instead?

See https://patchwork.mercurial-scm.org/patch/36755/ for discussion.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7922/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7922

To: Alphare, #hg-reviewers, pulkit, martinvonz
Cc: martinvonz, durin42, kevincox, mercurial-devel
phabricator - Feb. 11, 2020, 10:09 a.m.
Alphare added inline comments.

INLINE COMMENTS

> martinvonz wrote in matchers.rs:224
> Hmm, I don't like to replicate this into Rust. I argued for a long time with Boris over a year ago that we should see if we can remove it from Python. He said they (Octobus, I think) would look into that if I would just queue the workaround for the time being. Could you see if you can simplify the Python code first instead?
> 
> See https://patchwork.mercurial-scm.org/patch/36755/ for discussion.

I am having a little trouble reading the patchwork thread, but I gather that you want to get rid of the preventive splitting of patterns and just let the regex engine handle it on its own? Was this measure taken because of a bug in Python's `re` or because its exceptions were too coarse/unusable?
I'll look into the behavior of Re2, in that case.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7922/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7922

To: Alphare, #hg-reviewers, pulkit, martinvonz
Cc: martinvonz, durin42, kevincox, mercurial-devel
phabricator - Feb. 11, 2020, 10:58 a.m.
Alphare added inline comments.

INLINE COMMENTS

> Alphare wrote in matchers.rs:224
> I am having a little trouble reading the patchwork thread, but I gather that you want to get rid of the preventive splitting of patterns and just let the regex engine handle it on its own? Was this measure taken because of a bug in Python's `re` or because its exceptions were too coarse/unusable?
> I'll look into the behavior of Re2, in that case.

I've looked into the behavior of Re2. It will return an error if the DFA runs out of memory, which seems perfectly reasonable. 
I will simplify the Rust code, however I feel like you're better suited than I am to fix the Python side of things, since I don't really understand the ins-and-outs of the problem.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7922/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7922

To: Alphare, #hg-reviewers, pulkit, martinvonz
Cc: martinvonz, durin42, kevincox, mercurial-devel
phabricator - Feb. 28, 2020, 4:57 p.m.
durin42 added a comment.
durin42 accepted this revision as: durin42.


  I think this looks good, but I want Martin to ack that his concern is adequately resolved.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7922/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7922

To: Alphare, #hg-reviewers, pulkit, martinvonz, durin42
Cc: martinvonz, durin42, kevincox, mercurial-devel
phabricator - Feb. 28, 2020, 5:38 p.m.
This revision is now accepted and ready to land.
martinvonz added a comment.
martinvonz accepted this revision.


  In D7922#121720 <https://phab.mercurial-scm.org/D7922#121720>, @durin42 wrote:
  
  > I think this looks good, but I want Martin to ack that his concern is adequately resolved.
  
  Yes, looks good to me now. Thanks.

INLINE COMMENTS

> Alphare wrote in matchers.rs:224
> I've looked into the behavior of Re2. It will return an error if the DFA runs out of memory, which seems perfectly reasonable. 
> I will simplify the Rust code, however I feel like you're better suited than I am to fix the Python side of things, since I don't really understand the ins-and-outs of the problem.

Thanks for fixing the Rust side.

REPOSITORY
  rHG Mercurial

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7922/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7922

To: Alphare, #hg-reviewers, pulkit, martinvonz, durin42
Cc: martinvonz, durin42, kevincox, mercurial-devel
phabricator - March 9, 2020, 9:06 a.m.
pulkit added a comment.


  @martinvonz this seems accepted by you and Augie. Should I go ahead and push it or I missed some discussion on IRC about it?

REPOSITORY
  rHG Mercurial

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7922/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7922

To: Alphare, #hg-reviewers, pulkit, martinvonz, durin42
Cc: martinvonz, durin42, kevincox, mercurial-devel
phabricator - March 10, 2020, 3:20 p.m.
martinvonz added a comment.


  In D7922#123057 <https://phab.mercurial-scm.org/D7922#123057>, @pulkit wrote:
  
  > @martinvonz this (and next few patches) seems accepted by you and Augie. Should I go ahead and push it or I missed some discussion on IRC about it?
  
  You can push it. I wanted to finish the cleanup to the regex grouping that I mentioned earlier, but that can be done separately. But that's the only reason I haven't queued this patch anyway.

REPOSITORY
  rHG Mercurial

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7922/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7922

To: Alphare, #hg-reviewers, pulkit, martinvonz, durin42
Cc: martinvonz, durin42, kevincox, mercurial-devel
phabricator - March 10, 2020, 4:36 p.m.
pulkit added a comment.


  In D7922#123193 <https://phab.mercurial-scm.org/D7922#123193>, @martinvonz wrote:
  
  > In D7922#123057 <https://phab.mercurial-scm.org/D7922#123057>, @pulkit wrote:
  >
  >> @martinvonz this (and next few patches) seems accepted by you and Augie. Should I go ahead and push it or I missed some discussion on IRC about it?
  >
  > You can push it. I wanted to finish the cleanup to the regex grouping that I mentioned earlier, but that can be done separately. But that's the only reason I haven't queued this patch anyway.
  
  You mean cleanup on the Python side or on the rust side? IIUC, the rust side is fine now.

REPOSITORY
  rHG Mercurial

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7922/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7922

To: Alphare, #hg-reviewers, pulkit, martinvonz, durin42
Cc: martinvonz, durin42, kevincox, mercurial-devel
phabricator - March 10, 2020, 4:37 p.m.
martinvonz added a comment.


  In D7922#123207 <https://phab.mercurial-scm.org/D7922#123207>, @pulkit wrote:
  
  > In D7922#123193 <https://phab.mercurial-scm.org/D7922#123193>, @martinvonz wrote:
  >
  >> In D7922#123057 <https://phab.mercurial-scm.org/D7922#123057>, @pulkit wrote:
  >>
  >>> @martinvonz this (and next few patches) seems accepted by you and Augie. Should I go ahead and push it or I missed some discussion on IRC about it?
  >>
  >> You can push it. I wanted to finish the cleanup to the regex grouping that I mentioned earlier, but that can be done separately. But that's the only reason I haven't queued this patch anyway.
  >
  > You mean cleanup on the Python side or on the rust side? IIUC, the rust side is fine now.
  
  On the Python side.

REPOSITORY
  rHG Mercurial

BRANCH
  default

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7922/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7922

To: Alphare, #hg-reviewers, pulkit, martinvonz, durin42
Cc: martinvonz, durin42, kevincox, mercurial-devel

Patch

diff --git a/rust/hg-core/src/matchers.rs b/rust/hg-core/src/matchers.rs
--- a/rust/hg-core/src/matchers.rs
+++ b/rust/hg-core/src/matchers.rs
@@ -7,7 +7,12 @@ 
 
 //! Structs and types for matching files and directories.
 
-use crate::{utils::hg_path::HgPath, DirsMultiset, DirstateMapError};
+#[cfg(feature = "with-re2")]
+use crate::re2::Re2;
+use crate::{
+    filepatterns::PatternResult, utils::hg_path::HgPath, DirsMultiset,
+    DirstateMapError, PatternError,
+};
 use std::collections::HashSet;
 use std::iter::FromIterator;
 use std::ops::Deref;
@@ -215,6 +220,28 @@ 
         true
     }
 }
+
+const MAX_RE_SIZE: usize = 20000;
+
+#[cfg(feature = "with-re2")]
+/// Returns a function that matches an `HgPath` against the given regex
+/// pattern.
+///
+/// This can fail when the pattern is invalid or not supported by the
+/// underlying engine `Re2`, for instance anything with back-references.
+fn re_matcher(
+    pattern: &[u8],
+) -> PatternResult<impl Fn(&HgPath) -> bool + Sync> {
+    let regex = Re2::new(pattern);
+    let regex = regex.map_err(|e| PatternError::UnsupportedSyntax(e))?;
+    Ok(move |path: &HgPath| regex.is_match(path.as_bytes()))
+}
+
+#[cfg(not(feature = "with-re2"))]
+fn re_matcher(_: &[u8]) -> PatternResult<Box<dyn Fn(&HgPath) -> bool + Sync>> {
+    Err(PatternError::Re2NotInstalled)
+}
+
 #[cfg(test)]
 mod tests {
     use super::*;
diff --git a/rust/hg-core/src/lib.rs b/rust/hg-core/src/lib.rs
--- a/rust/hg-core/src/lib.rs
+++ b/rust/hg-core/src/lib.rs
@@ -126,6 +126,9 @@ 
     /// Needed a pattern that can be turned into a regex but got one that can't
     /// This should only happen through programmer error.
     NonRegexPattern(IgnorePattern),
+    /// This is temporary, see `re2/mod.rs`.
+    /// This will cause a fallback to Python.
+    Re2NotInstalled,
 }
 
 impl ToString for PatternError {
@@ -148,6 +151,10 @@ 
             PatternError::NonRegexPattern(pattern) => {
                 format!("'{:?}' cannot be turned into a regex", pattern)
             }
+            PatternError::Re2NotInstalled => {
+                "Re2 is not installed, cannot use regex functionality."
+                    .to_string()
+            }
         }
     }
 }