Patchwork D6231: rust-discovery: starting core implementation

login
register
mail settings
Submitter phabricator
Date April 12, 2019, 6:34 p.m.
Message ID <differential-rev-PHID-DREV-dbgw4tpswqtbazhlxv4q-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/39576/
State Superseded
Headers show

Comments

phabricator - April 12, 2019, 6:34 p.m.
gracinet created this revision.
Herald added subscribers: mercurial-devel, mjpieters, kevincox, durin42.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Once exposed to the Python side, this core object will avoid
  costly roundtrips with potentially big sets of revisions.
  
  This changeset implements the core logic of the object only, i.e.,
  manipulation of the missing, common and undefined set-like revision
  attributes.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6231

AFFECTED FILES
  rust/hg-core/src/discovery.rs
  rust/hg-core/src/lib.rs

CHANGE DETAILS




To: gracinet, #hg-reviewers
Cc: durin42, kevincox, mjpieters, mercurial-devel
phabricator - April 14, 2019, 8:14 p.m.
kevincox accepted this revision.
kevincox added a comment.


  I'm not a huge fan of the design of this class because it has a strong protocol to use correctly. It would probably be better to have 3 classes and when you apply new information each class transitions into the next one. This allows the type system to ensure that you are using the class correctly. However it appears that this is just re-implementing an API so improving that API is probably out of scope for the time being.
  
  Otherwise the implementation looks good.

INLINE COMMENTS

> discovery.rs:57
> +        self.common
> +            .remove_ancestors_from(self.undecided.as_mut().unwrap())?;
> +        Ok(())

You can replace the early return by an `if let`. This also removed the `.unwrap()` which is always nice.

  if let Some(ref mut undecided) = self.undecided {
      self.common.remove_ancestors_from(undecided)?;
  }

> discovery.rs:73
> +        let undecided_mut = self.undecided.as_mut().unwrap();
> +        for missrev in range.into_iter() {
> +            self.missing.insert(missrev);

I believe this `.into_iter()` is redundant because the `for` loop calls it for you.

> discovery.rs:93
> +    /// This is the end goal of the discovery process seeks,
> +    /// once `self.is_complete()` returns `true`.
> +    pub fn common_heads(&self) -> Result<HashSet<Revision>, GraphError> {

This this imply that this method shouldn't be called until then? If so you should probably add a `debug_assert!(self.is_complete())`.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6231

To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mjpieters, mercurial-devel
phabricator - April 15, 2019, 10:09 a.m.
gracinet added a comment.


  @kevincox Thanks for the reviews !
  
  > It would probably be better to have 3 classes and when you apply new information each class transitions into the next one. This allows the type system to ensure that you are using the class correctly.
  
  That's an interesting idea. I expect it to solve my gripes with `ensure_undecided()` neatly. I'm not sure how it'd fare on the Python side, and probably wouldn't do it in the bindings to maintain compatibility, but I'll play with it a bit.
  Maybe this could be a good topic for a follow-up ?
  Note: I still have the various sampling methods to submit today or tomorrow, but I think they would play well with that idea.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6231

To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mjpieters, mercurial-devel
phabricator - April 15, 2019, 10:21 a.m.
kevincox added a comment.


  In https://phab.mercurial-scm.org/D6231#90744, @gracinet wrote:
  
  > That's an interesting idea. I expect it to solve my gripes with `ensure_undecided()` neatly. I'm not sure how it'd fare on the Python side, and probably wouldn't do it in the bindings to maintain compatibility, but I'll play with it a bit.
  
  
  Well the point is to force the caller to acknowledge the state change, but in python that isn't really possible because it is dynamically typed. What I would probably do is still implement the Rust part with multiple structs but then wrap it in an enum for the python API. Then each python method call turns into.
  
    fn do_thing() {
      match self {
        HasInfo(state) => state.do_thing()
        other => unreachable!("Invalid state for do_thing(). Expected HasInfo got {:?}.", other);
      }
    }
  
  And yes, this should allow you to avoid all of the `unwrap`ing since it would all be done at the API boundary.
  
  I think you could also propagate this to python by having the "state" transition methods return a new object. However because you can't ensure that the code drops the access to the previous object you would have to copy everything into the new object or ensure that you don't mutate shared state. The value is still lower here though because of the dynamic nature of python.
  
  > Maybe this could be a good topic for a follow-up ?
  
  Yeah, sounds reasonable.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6231

To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mjpieters, mercurial-devel
phabricator - April 15, 2019, 4:25 p.m.
gracinet added a comment.


  > What I would probably do is still implement the Rust part with multiple structs but then wrap it in an enum for the python API.
  
  Yes, that's what I had in mind. Not sure where that enum would be defined (hg-cpython or hg-core) but it doesn't matter so much.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6231

To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mjpieters, mercurial-devel
phabricator - April 15, 2019, 4:43 p.m.
kevincox added a comment.


  In https://phab.mercurial-scm.org/D6231#90756, @gracinet wrote:
  
  > Yes, that's what I had in mind. Not sure where that enum would be defined (hg-cpython or hg-core) but it doesn't matter so much.
  
  
  I would probably stick it in hg-cpython because it is a compromise for the python binding. For anyone using hg-core from rust I would hope that they directly use the structs.
  
  But yes, not to important either way.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6231

To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mjpieters, mercurial-devel
phabricator - April 15, 2019, 4:50 p.m.
gracinet added inline comments.

INLINE COMMENTS

> kevincox wrote in discovery.rs:57
> You can replace the early return by an `if let`. This also removed the `.unwrap()` which is always nice.
> 
>   if let Some(ref mut undecided) = self.undecided {
>       self.common.remove_ancestors_from(undecided)?;
>   }

ah, yes the trick is the `ref mut` inside the `if let`, I see

> kevincox wrote in discovery.rs:93
> This this imply that this method shouldn't be called until then? If so you should probably add a `debug_assert!(self.is_complete())`.

I'm not sure. It wouldn't be a straight error to call this early, as long as the caller knows what this means, but I don't have any use-case before the discovery is complete.

On the other hand, I could go as far as consume the discovery object and use `self.common.into_bases_heads()`, but maybe that could make things more complicated for the Python bindings.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6231

To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mjpieters, mercurial-devel
phabricator - April 15, 2019, 4:59 p.m.
kevincox added inline comments.

INLINE COMMENTS

> gracinet wrote in discovery.rs:93
> I'm not sure. It wouldn't be a straight error to call this early, as long as the caller knows what this means, but I don't have any use-case before the discovery is complete.
> 
> On the other hand, I could go as far as consume the discovery object and use `self.common.into_bases_heads()`, but maybe that could make things more complicated for the Python bindings.

If it is allowed can you document what happens what happens if you call it before then?

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6231

To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mjpieters, mercurial-devel
phabricator - April 17, 2019, 12:32 p.m.
gracinet marked an inline comment as done.
gracinet added a comment.


  I believe to have addressed your immediate concerns.
  
  I played with  splitting in different `struct`s for the various stages of the process. It is indeed cleaner and clearer Rust code.
  However mutating in place of `Enum`s exposed to Python is a bit of a headache, because of the wrapping in `RefCell<Box<T>>`, and I don't have a solution for that, so I'm gonna think about that in a follow-up.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6231

To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mjpieters, mercurial-devel
phabricator - April 17, 2019, 12:58 p.m.
kevincox added a comment.


  In https://phab.mercurial-scm.org/D6231#91102, @gracinet wrote:
  
  > I believe to have addressed your immediate concerns.
  >
  > I played with  splitting in different `struct`s for the various stages of the process. It is indeed cleaner and clearer Rust code.
  >  However mutating in place of `Enum`s exposed to Python is a bit of a headache, because of the wrapping in `RefCell<Box<T>>`, and I don't have a solution for that, so I'm gonna think about that in a follow-up.
  
  
  Thanks, it looks good for now :)

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6231

To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mjpieters, mercurial-devel

Patch

diff --git a/rust/hg-core/src/lib.rs b/rust/hg-core/src/lib.rs
--- a/rust/hg-core/src/lib.rs
+++ b/rust/hg-core/src/lib.rs
@@ -6,6 +6,7 @@ 
 pub mod dagops;
 pub use ancestors::{AncestorsIterator, LazyAncestors, MissingAncestors};
 pub mod testing;  // unconditionally built, for use from integration tests
+pub mod discovery;
 
 /// Mercurial revision numbers
 ///
diff --git a/rust/hg-core/src/discovery.rs b/rust/hg-core/src/discovery.rs
new file mode 100644
--- /dev/null
+++ b/rust/hg-core/src/discovery.rs
@@ -0,0 +1,191 @@ 
+// discovery.rs
+//
+// Copyright 2019 Georges Racinet <georges.racinet@octobus.net>
+//
+// This software may be used and distributed according to the terms of the
+// GNU General Public License version 2 or any later version.
+
+//! Discovery operations
+//!
+//! This is a Rust counterpart to the `partialdiscovery` class of
+//! `mercurial.setdiscovery`
+
+use super::{Graph, GraphError, Revision};
+use crate::ancestors::MissingAncestors;
+use crate::dagops;
+use std::collections::HashSet;
+
+pub struct PartialDiscovery<G: Graph + Clone> {
+    target_heads: Option<Vec<Revision>>,
+    graph: G, // plays the role of self._repo
+    common: MissingAncestors<G>,
+    undecided: Option<HashSet<Revision>>,
+    missing: HashSet<Revision>,
+}
+
+impl<G: Graph + Clone> PartialDiscovery<G> {
+    /// Create a PartialDiscovery object, with the intent
+    /// of comparing our `::<target_heads>` revset to the contents of another
+    /// repo.
+    ///
+    /// For now `target_heads` is passed as a vector, and will be used
+    /// at the first call to `ensure_undecided()`.
+    ///
+    /// If we want to make the signature more flexible,
+    /// we'll have to make it a type argument of `PartialDiscovery` or a trait
+    /// object since we'll keep it in the meanwhile
+    pub fn new(graph: G, target_heads: Vec<Revision>) -> Self {
+        PartialDiscovery {
+            undecided: None,
+            target_heads: Some(target_heads),
+            graph: graph.clone(),
+            common: MissingAncestors::new(graph, vec![]),
+            missing: HashSet::new(),
+        }
+    }
+
+    /// Register revisions known as being common
+    pub fn add_common_revisions(
+        &mut self,
+        common: impl IntoIterator<Item = Revision>,
+    ) -> Result<(), GraphError> {
+        self.common.add_bases(common);
+        if self.undecided.is_none() {
+            return Ok(());
+        }
+        self.common
+            .remove_ancestors_from(self.undecided.as_mut().unwrap())?;
+        Ok(())
+    }
+
+    /// Register revisions known as being missing
+    pub fn add_missing_revisions(
+        &mut self,
+        missing: impl IntoIterator<Item = Revision>,
+    ) -> Result<(), GraphError> {
+        self.ensure_undecided()?;
+        let range = dagops::range(
+            &self.graph,
+            missing,
+            self.undecided.as_ref().unwrap().iter().cloned(),
+        )?;
+        let undecided_mut = self.undecided.as_mut().unwrap();
+        for missrev in range.into_iter() {
+            self.missing.insert(missrev);
+            undecided_mut.remove(&missrev);
+        }
+        Ok(())
+    }
+
+    /// Do we have any information about the peer?
+    pub fn has_info(&self) -> bool {
+        self.common.has_bases()
+    }
+
+    /// Did we acquire full knowledge of our Revisions that the peer has?
+    pub fn is_complete(&self) -> bool {
+        self.undecided.as_ref().map_or(false, |s| s.is_empty())
+    }
+
+    /// Return the heads of the common set of revisions
+    ///
+    /// This is the end goal of the discovery process seeks,
+    /// once `self.is_complete()` returns `true`.
+    pub fn common_heads(&self) -> Result<HashSet<Revision>, GraphError> {
+        self.common.bases_heads()
+    }
+
+    /// Force first computation of `self.undecided`
+    ///
+    /// After this, `self.undecided.as_ref()` and `.as_mut()` can be
+    /// unwrapped to get workable immutable or mutable references without
+    /// any panic.
+    ///
+    /// This is an imperative call instead of an access with added lazyness
+    /// to reduce easily the scope of mutable borrow for the caller,
+    /// compared to undecided(&'a mut self) -> &'a… that would keep it
+    /// as long as the resulting immutable one.
+    fn ensure_undecided(&mut self) -> Result<(), GraphError> {
+        if self.undecided.is_some() {
+            return Ok(());
+        }
+        let tgt = self.target_heads.take().unwrap();
+        self.undecided =
+            Some(self.common.missing_ancestors(tgt)?.into_iter().collect());
+        Ok(())
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::testing::SampleGraph;
+
+    /// A PartialDiscovery as for pushing all the heads of `SampleGraph`
+    fn full_disco() -> PartialDiscovery<SampleGraph> {
+        PartialDiscovery::new(SampleGraph, vec![10, 11, 12, 13])
+    }
+
+    fn sorted_undecided(
+        disco: &PartialDiscovery<SampleGraph>,
+    ) -> Vec<Revision> {
+        let mut as_vec: Vec<Revision> =
+            disco.undecided.as_ref().unwrap().iter().cloned().collect();
+        as_vec.sort();
+        as_vec
+    }
+
+    fn sorted_missing(disco: &PartialDiscovery<SampleGraph>) -> Vec<Revision> {
+        let mut as_vec: Vec<Revision> =
+            disco.missing.iter().cloned().collect();
+        as_vec.sort();
+        as_vec
+    }
+
+    fn sorted_common_heads(
+        disco: &PartialDiscovery<SampleGraph>,
+    ) -> Result<Vec<Revision>, GraphError> {
+        let mut as_vec: Vec<Revision> =
+            disco.common_heads()?.iter().cloned().collect();
+        as_vec.sort();
+        Ok(as_vec)
+    }
+
+    #[test]
+    fn test_add_common_get_undecided() -> Result<(), GraphError> {
+        let mut disco = full_disco();
+        assert_eq!(disco.undecided, None);
+        assert!(!disco.has_info());
+
+        disco.add_common_revisions(vec![11, 12])?;
+        assert!(disco.has_info());
+        assert!(!disco.is_complete());
+        assert!(disco.missing.is_empty());
+
+        // add_common_revisions did not trigger a premature computation
+        // of `undecided`, let's check that and ask for them
+        assert_eq!(disco.undecided, None);
+        disco.ensure_undecided()?;
+        assert_eq!(sorted_undecided(&disco), vec![5, 8, 10, 13]);
+        Ok(())
+    }
+
+    /// in this test, we pretend that our peer misses exactly (8+10)::
+    /// and we're comparing all our repo to it (as in a bare push)
+    #[test]
+    fn test_discovery() -> Result<(), GraphError> {
+        let mut disco = full_disco();
+        disco.add_common_revisions(vec![11, 12])?;
+        disco.add_missing_revisions(vec![8, 10])?;
+        assert_eq!(sorted_undecided(&disco), vec![5]);
+        assert_eq!(sorted_missing(&disco), vec![8, 10, 13]);
+        assert!(!disco.is_complete());
+
+        disco.add_common_revisions(vec![5])?;
+        assert_eq!(sorted_undecided(&disco), vec![]);
+        assert_eq!(sorted_missing(&disco), vec![8, 10, 13]);
+        assert!(disco.is_complete());
+        assert_eq!(sorted_common_heads(&disco)?, vec![5, 11, 12]);
+        Ok(())
+    }
+}