Patchwork D7791: rust-nodemap: NodeMap trait with simplest implementor

login
register
mail settings
Submitter phabricator
Date Jan. 6, 2020, 7:25 p.m.
Message ID <differential-rev-PHID-DREV-lnbjhm3ne7nzkrpnh3dj-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/44149/
State New
Headers show

Comments

phabricator - Jan. 6, 2020, 7:25 p.m.
gracinet created this revision.
Herald added subscribers: mercurial-devel, kevincox, durin42.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  We're defining here only a small part of the immutable
  methods it will have at the end. This is so we can
  focus in the following changesets on the needed abstractions
  for a mutable append-only serializable version.
  
  The first implementor exposes the actual lookup
  algorithm in its simplest form. It will have to be expanded
  to account for the missing methods, and the special cases
  related to NULL_NODE.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D7791

AFFECTED FILES
  rust/hg-core/src/revlog/nodemap.rs

CHANGE DETAILS




To: gracinet, #hg-reviewers
Cc: durin42, kevincox, mercurial-devel
phabricator - Jan. 16, 2020, 10:08 a.m.
kevincox added inline comments.
kevincox accepted this revision.

INLINE COMMENTS

> nodemap.rs:37
> +/// whose data should not be owned by the `NodeMap`.
> +pub trait NodeMap {
> +    fn find_bin<'a>(

Can you please add doc-comments for this? I find that documenting trait methods is especially important.

> nodemap.rs:205
> +impl fmt::Debug for NodeTree {
> +    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
> +        let blocks: &[Block] = &*self.readonly;

I don't think the `<'_>` is necessary.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7791/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7791

To: gracinet, #hg-reviewers, kevincox
Cc: durin42, kevincox, mercurial-devel
phabricator - Jan. 21, 2020, 9:42 p.m.
martinvonz added inline comments.

INLINE COMMENTS

> nodemap.rs:150
> +pub struct NodeTree {
> +    readonly: Box<dyn Deref<Target = [Block]> + Send>,
> +}

I find `Deref<Target = [Block]> + Send` hard to understand. I think it would be helpful to name it. Could we at least define an alias for it?

Perhaps it's even better to define a trait for it and add named methods on that, but if those methods would just be `len()` and `index()` it probably doesn't make any sense to define the trait.

> nodemap.rs:150
> +pub struct NodeTree {
> +    readonly: Box<dyn Deref<Target = [Block]> + Send>,
> +}

Do we need `Send`? Maybe it later when you need it (unless you actually need it now and I'm just mistaken, of course).

> nodemap.rs:153
> +
> +/// Return `None` unless the `Node` for `rev` has given prefix in `index`.
> +fn has_prefix_or_none<'p>(

I understand that you picked this interface because it works well for the caller, but it feel a little weird to always return `None` or `Some(rev)` where `rev` is one of the function inputs. It's practically a boolean-valued function. Do the callers get much harder to read if you actually make it boolean-valued?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7791/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7791

To: gracinet, #hg-reviewers, kevincox
Cc: martinvonz, durin42, kevincox, mercurial-devel
phabricator - Jan. 21, 2020, 10:17 p.m.
Alphare added inline comments.

INLINE COMMENTS

> martinvonz wrote in nodemap.rs:150
> I find `Deref<Target = [Block]> + Send` hard to understand. I think it would be helpful to name it. Could we at least define an alias for it?
> 
> Perhaps it's even better to define a trait for it and add named methods on that, but if those methods would just be `len()` and `index()` it probably doesn't make any sense to define the trait.

> I find Deref<Target = [Block]> + Send hard to understand. I think it would be helpful to name it. Could we at least define an alias for it?

Alas, trait aliases are not stable: rust#55628 <https://github.com/rust-lang/rust/issues/55628>.

On a more subjective note, I find this syntax quite readable as it is (as readable as Rust usually is), and since this is a single occurrence in a private field, I find it reasonable.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7791/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7791

To: gracinet, #hg-reviewers, kevincox
Cc: Alphare, martinvonz, durin42, kevincox, mercurial-devel
phabricator - Jan. 21, 2020, 10:27 p.m.
martinvonz added inline comments.

INLINE COMMENTS

> Alphare wrote in nodemap.rs:150
> > I find Deref<Target = [Block]> + Send hard to understand. I think it would be helpful to name it. Could we at least define an alias for it?
> 
> Alas, trait aliases are not stable: rust#55628 <https://github.com/rust-lang/rust/issues/55628>.
> 
> On a more subjective note, I find this syntax quite readable as it is (as readable as Rust usually is), and since this is a single occurrence in a private field, I find it reasonable.

> Alas, trait aliases are not stable: rust#55628.

I didn't know about trait aliases, but I think you can already do this:

  trait Thing : Deref<Target = [Block]> + Send  {}



> ... since this is a single occurrence in a private field, I find it reasonable.

It's one instance right now. It's 7 instances of `Box<dyn Deref<...>>` at the end of the series...

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7791/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7791

To: gracinet, #hg-reviewers, kevincox
Cc: Alphare, martinvonz, durin42, kevincox, mercurial-devel
phabricator - Jan. 22, 2020, 9:43 a.m.
Alphare added inline comments.

INLINE COMMENTS

> martinvonz wrote in nodemap.rs:150
> > Alas, trait aliases are not stable: rust#55628.
> 
> I didn't know about trait aliases, but I think you can already do this:
> 
>   trait Thing : Deref<Target = [Block]> + Send  {}
> 
> 
> 
> > ... since this is a single occurrence in a private field, I find it reasonable.
> 
> It's one instance right now. It's 7 instances of `Box<dyn Deref<...>>` at the end of the series...

> I didn't know about trait aliases, but I think you can already do this:
>  `trait Thing : Deref<Target = [Block]> + Send  {}`

Using an empty supertrait will not work because fat pointers have different metadata.
https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=4f1d374e426c9791cbf73421160dc7c3

> It's one instance right now. It's 7 instances of Box<dyn Deref<...>> at the end of the series...

Ah true, that is indeed verbose, but manageable. Unfortunately, from a language perspective, I think that a type macro would be our only choice, but my opinion is that 1) it's ugly and more code to maintain and 2) I'm not even sure it would work.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7791/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7791

To: gracinet, #hg-reviewers, kevincox
Cc: Alphare, martinvonz, durin42, kevincox, mercurial-devel
phabricator - Jan. 22, 2020, 3:10 p.m.
gracinet added inline comments.

INLINE COMMENTS

> martinvonz wrote in nodemap.rs:150
> I find `Deref<Target = [Block]> + Send` hard to understand. I think it would be helpful to name it. Could we at least define an alias for it?
> 
> Perhaps it's even better to define a trait for it and add named methods on that, but if those methods would just be `len()` and `index()` it probably doesn't make any sense to define the trait.

I've looked into trait alias while working on this, for the exact same reason, and went through most you guys are saying about that.

Seems that this is complicated for the language because they'd like to have the
possibility to `impl` it.

> martinvonz wrote in nodemap.rs:150
> Do we need `Send`? Maybe it later when you need it (unless you actually need it now and I'm just mistaken, of course).

About `Send`, without it we can't expose `NodeTree` to the Python layer, IIRC

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7791/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7791

To: gracinet, #hg-reviewers, kevincox
Cc: Alphare, martinvonz, durin42, kevincox, mercurial-devel
phabricator - Jan. 22, 2020, 4:23 p.m.
martinvonz added inline comments.

INLINE COMMENTS

> gracinet wrote in nodemap.rs:150
> I've looked into trait alias while working on this, for the exact same reason, and went through most you guys are saying about that.
> 
> Seems that this is complicated for the language because they'd like to have the
> possibility to `impl` it.

How about something like this then?

  type BlockSource = Box<dyn Deref<Target = [Block]> + Send>;
  type ByteSource = Box<dyn Deref<Target = [u8]> + Send>;

I won't insist, so up to you if you think it helps.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7791/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7791

To: gracinet, #hg-reviewers, kevincox
Cc: Alphare, martinvonz, durin42, kevincox, mercurial-devel

Patch

diff --git a/rust/hg-core/src/revlog/nodemap.rs b/rust/hg-core/src/revlog/nodemap.rs
--- a/rust/hg-core/src/revlog/nodemap.rs
+++ b/rust/hg-core/src/revlog/nodemap.rs
@@ -12,8 +12,43 @@ 
 //! Following existing implicit conventions, the "nodemap" terminology
 //! is used in a more abstract context.
 
-use super::Revision;
+use super::{NodeError, NodePrefix, NodePrefixRef, Revision, RevlogIndex};
 use std::fmt;
+use std::ops::Deref;
+
+#[derive(Debug, PartialEq)]
+pub enum NodeMapError {
+    MultipleResults,
+    InvalidNodePrefix(NodeError),
+    /// A `Revision` stored in the nodemap could not be found in the index
+    RevisionNotInIndex(Revision),
+}
+
+impl From<NodeError> for NodeMapError {
+    fn from(err: NodeError) -> Self {
+        NodeMapError::InvalidNodePrefix(err)
+    }
+}
+
+/// Mapping system from Mercurial nodes to revision numbers.
+///
+/// Many methods in this trait work in conjunction with a `RevlogIndex`
+/// whose data should not be owned by the `NodeMap`.
+pub trait NodeMap {
+    fn find_bin<'a>(
+        &self,
+        idx: &impl RevlogIndex,
+        prefix: NodePrefixRef<'a>,
+    ) -> Result<Option<Revision>, NodeMapError>;
+
+    fn find_hex(
+        &self,
+        idx: &impl RevlogIndex,
+        prefix: &str,
+    ) -> Result<Option<Revision>, NodeMapError> {
+        self.find_bin(idx, NodePrefix::from_hex(prefix)?.borrow())
+    }
+}
 
 /// Low level NodeTree [`Blocks`] elements
 ///
@@ -112,9 +147,87 @@ 
     }
 }
 
+/// A 16-radix tree with the root block at the end
+pub struct NodeTree {
+    readonly: Box<dyn Deref<Target = [Block]> + Send>,
+}
+
+/// Return `None` unless the `Node` for `rev` has given prefix in `index`.
+fn has_prefix_or_none<'p>(
+    idx: &impl RevlogIndex,
+    prefix: NodePrefixRef<'p>,
+    rev: Revision,
+) -> Result<Option<Revision>, NodeMapError> {
+    idx.node(rev)
+        .ok_or_else(|| NodeMapError::RevisionNotInIndex(rev))
+        .map(|node| {
+            if prefix.is_prefix_of(node) {
+                Some(rev)
+            } else {
+                None
+            }
+        })
+}
+
+impl NodeTree {
+    /// Main working method for `NodeTree` searches
+    ///
+    /// This partial implementation lacks
+    /// - special cases for NULL_REVISION
+    fn lookup<'p>(
+        &self,
+        prefix: NodePrefixRef<'p>,
+    ) -> Result<Option<Revision>, NodeMapError> {
+        let blocks: &[Block] = &*self.readonly;
+        if blocks.is_empty() {
+            return Ok(None);
+        }
+        let mut visit = blocks.len() - 1;
+        for i in 0..prefix.len() {
+            let nybble = prefix.get_nybble(i);
+            match blocks[visit].read(nybble) {
+                Element::None => return Ok(None),
+                Element::Rev(r) => return Ok(Some(r)),
+                Element::Block(idx) => visit = idx,
+            }
+        }
+        Err(NodeMapError::MultipleResults)
+    }
+}
+
+impl From<Vec<Block>> for NodeTree {
+    fn from(vec: Vec<Block>) -> Self {
+        NodeTree {
+            readonly: Box::new(vec),
+        }
+    }
+}
+
+impl fmt::Debug for NodeTree {
+    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+        let blocks: &[Block] = &*self.readonly;
+        write!(f, "readonly: {:?}", blocks)
+    }
+}
+
+impl NodeMap for NodeTree {
+    fn find_bin<'a>(
+        &self,
+        idx: &impl RevlogIndex,
+        prefix: NodePrefixRef<'a>,
+    ) -> Result<Option<Revision>, NodeMapError> {
+        self.lookup(prefix.clone()).and_then(|opt| {
+            opt.map_or(Ok(None), |rev| has_prefix_or_none(idx, prefix, rev))
+        })
+    }
+}
+
 #[cfg(test)]
 mod tests {
+    use super::NodeMapError::*;
     use super::*;
+    use crate::revlog::node::{node_from_hex, Node};
+    use std::collections::HashMap;
 
     /// Creates a `Block` using a syntax close to the `Debug` output
     macro_rules! block {
@@ -160,4 +273,70 @@ 
         assert_eq!(block.read(4), Element::Rev(1));
     }
 
+    type TestIndex = HashMap<Revision, Node>;
+
+    impl RevlogIndex for TestIndex {
+        fn node(&self, rev: Revision) -> Option<&Node> {
+            self.get(&rev)
+        }
+
+        fn len(&self) -> usize {
+            self.len()
+        }
+    }
+
+    /// Pad hexadecimal Node prefix with zeros on the right, then insert
+    ///
+    /// This is just to avoid having to repeatedly write 40 hexadecimal
+    /// digits for test data.
+    fn pad_insert(idx: &mut TestIndex, rev: Revision, hex: &str) {
+        idx.insert(rev, node_from_hex(&format!("{:0<40}", hex)).unwrap());
+    }
+
+    fn sample_nodetree() -> NodeTree {
+        NodeTree::from(vec![
+            block![0: Rev(9)],
+            block![0: Rev(0), 1: Rev(9)],
+            block![0: Block(1), 1:Rev(1)],
+        ])
+    }
+
+    #[test]
+    fn test_nt_debug() {
+        let nt = sample_nodetree();
+        assert_eq!(
+            format!("{:?}", nt),
+            "readonly: \
+             [[0: Rev(9)], [0: Rev(0), 1: Rev(9)], [0: Block(1), 1: Rev(1)]]"
+        );
+    }
+
+    #[test]
+    fn test_immutable_find_simplest() -> Result<(), NodeMapError> {
+        let mut idx: TestIndex = HashMap::new();
+        pad_insert(&mut idx, 1, "1234deadcafe");
+
+        let nt = NodeTree::from(vec![block![1: Rev(1)]]);
+        assert_eq!(nt.find_hex(&idx, "1")?, Some(1));
+        assert_eq!(nt.find_hex(&idx, "12")?, Some(1));
+        assert_eq!(nt.find_hex(&idx, "1234de")?, Some(1));
+        assert_eq!(nt.find_hex(&idx, "1a")?, None);
+        assert_eq!(nt.find_hex(&idx, "ab")?, None);
+        Ok(())
+    }
+
+    #[test]
+    fn test_immutable_find_one_jump() {
+        let mut idx = TestIndex::new();
+        pad_insert(&mut idx, 9, "012");
+        pad_insert(&mut idx, 0, "00a");
+
+        let nt = sample_nodetree();
+
+        assert_eq!(nt.find_hex(&idx, "0"), Err(MultipleResults));
+        assert_eq!(nt.find_hex(&idx, "01"), Ok(Some(9)));
+        assert_eq!(nt.find_hex(&idx, "00"), Ok(Some(0)));
+        assert_eq!(nt.find_hex(&idx, "00a"), Ok(Some(0)));
+    }
+
 }