Patchwork D7790: rust-node: handling binary Node prefix

login
register
mail settings
Submitter phabricator
Date Jan. 6, 2020, 7:25 p.m.
Message ID <differential-rev-PHID-DREV-p7g75y5uoklvlc7n2gmk-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/44150/
State New
Headers show

Comments

phabricator - Jan. 6, 2020, 7:25 p.m.
gracinet created this revision.
Herald added subscribers: mercurial-devel, kevincox, durin42.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Parallel to the inner signatures of the nodetree functions in
  revlog.c, we'll have to handle prefixes of `Node` in binary
  form.
  
  There's a complication due to the fact that we'll be sometimes
  interested in prefixes with an odd number of hexadecimal digits,
  which translates in binary form by a last byte in which only the
  highest weight 4 bits are considered.
  
  There are a few candidates for inlining here, but we refrain from
  such premature optimizations, letting the compiler decide.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D7790

AFFECTED FILES
  rust/hg-core/src/revlog.rs
  rust/hg-core/src/revlog/node.rs

CHANGE DETAILS




To: gracinet, #hg-reviewers
Cc: durin42, kevincox, mercurial-devel
phabricator - Jan. 18, 2020, 4:16 a.m.
martinvonz added inline comments.

INLINE COMMENTS

> node.rs:62-64
> +/// Since it can potentially come from an hexadecimal representation with
> +/// odd length, it needs to carry around whether the last 4 bits are relevant
> +/// or not.

Did you consider using a full u8 for each nibble instead? It seems like that would be simpler. I'd appreciate it if you could spend a few minutes to try that (if you haven't already).

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7790/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7790

To: gracinet, #hg-reviewers
Cc: martinvonz, durin42, kevincox, mercurial-devel
phabricator - Jan. 20, 2020, 1:25 p.m.
gracinet added a comment.


  @martinvonz in this code, we're in competition with the C implementation, which does something similar.
  
  Switching to a full u8 per nybble is more than a few minutes, it means changing the ownership model completely. Also, it introduces a new allocation and a copy instead of merely taking a reference.
  
  So, it's more work and has performance impact that we have no means to measure.
  
  The odd/even thing is not *that* complicated. It's very bad for readability in the C code, but it's nicely encapsulated in the Rust case and fully tested. I'd be more comfortable trying what you're suggesting once we have real-life measurements.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7790/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7790

To: gracinet, #hg-reviewers
Cc: martinvonz, durin42, kevincox, mercurial-devel
phabricator - Jan. 21, 2020, 5:55 p.m.
martinvonz added inline comments.

INLINE COMMENTS

> node.rs:79
> +        let is_odd = len % 2 == 1;
> +        let mut buf: Vec<u8> = Vec::with_capacity(20);
> +        for i in 0..len / 2 {

Why not use `(len + 1) / 2` as capacity?

> node.rs:89
> +
> +    pub fn borrow<'a>(&'a self) -> NodePrefixRef<'a> {
> +        NodePrefixRef {

Is this lifetime parameter needed?

> node.rs:136
> +        NodePrefixRef {
> +            buf: &*node,
> +            is_odd: false,

What does the `&*` do? Specifically, what's different if you drop that part?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7790/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7790

To: gracinet, #hg-reviewers
Cc: martinvonz, durin42, kevincox, mercurial-devel
phabricator - Jan. 21, 2020, 5:55 p.m.
martinvonz added a comment.


  In D7790#116757 <https://phab.mercurial-scm.org/D7790#116757>, @gracinet wrote:
  
  > @martinvonz in this code, we're in competition with the C implementation, which does something similar.
  > Switching to a full u8 per nybble is more than a few minutes, it means changing the ownership model completely. Also, it introduces a new allocation and a copy instead of merely taking a reference.
  
  Depends on the definition of `NodePrefixRef`. I don't think there would be any extra allocation if you define it like this:
  
    pub enum NodePrefixRef<'a> {
        Full(&'a Node),
        Prefix(&'a NodePrefix),
    }
  
  
  
  > So, it's more work and has performance impact that we have no means to measure. 
  > The odd/even thing is not *that* complicated. It's very bad for readability in the C code, but it's nicely encapsulated in the Rust case and fully tested. I'd be more comfortable trying what you're suggesting once we have real-life measurements.
  
  Sure, please revisit it when you can do measurements.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7790/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7790

To: gracinet, #hg-reviewers
Cc: martinvonz, durin42, kevincox, mercurial-devel

Patch

diff --git a/rust/hg-core/src/revlog/node.rs b/rust/hg-core/src/revlog/node.rs
--- a/rust/hg-core/src/revlog/node.rs
+++ b/rust/hg-core/src/revlog/node.rs
@@ -19,6 +19,7 @@ 
 #[derive(Debug, PartialEq)]
 pub enum NodeError {
     ExactLengthRequired(String),
+    PrefixTooLong(String),
     NotHexadecimal,
 }
 
@@ -56,6 +57,88 @@ 
     }
 }
 
+/// The beginning of a binary revision SHA.
+///
+/// Since it can potentially come from an hexadecimal representation with
+/// odd length, it needs to carry around whether the last 4 bits are relevant
+/// or not.
+#[derive(Debug, PartialEq)]
+pub struct NodePrefix {
+    buf: Vec<u8>,
+    is_odd: bool,
+}
+
+impl NodePrefix {
+    /// Conversion from hexadecimal string representation
+    pub fn from_hex(hex: &str) -> Result<Self, NodeError> {
+        let len = hex.len();
+        if len > 40 {
+            return Err(NodeError::PrefixTooLong(hex.to_string()));
+        }
+        let is_odd = len % 2 == 1;
+        let mut buf: Vec<u8> = Vec::with_capacity(20);
+        for i in 0..len / 2 {
+            buf.push(u8::from_str_radix(&hex[i * 2..i * 2 + 2], 16)?);
+        }
+        if is_odd {
+            buf.push(u8::from_str_radix(&hex[len - 1..], 16)? << 4);
+        }
+        Ok(NodePrefix { buf, is_odd })
+    }
+
+    pub fn borrow<'a>(&'a self) -> NodePrefixRef<'a> {
+        NodePrefixRef {
+            buf: &self.buf,
+            is_odd: self.is_odd,
+        }
+    }
+}
+
+#[derive(Clone, Debug, PartialEq)]
+pub struct NodePrefixRef<'a> {
+    buf: &'a [u8],
+    is_odd: bool,
+}
+
+impl<'a> NodePrefixRef<'a> {
+    pub fn len(&self) -> usize {
+        if self.is_odd {
+            self.buf.len() * 2 - 1
+        } else {
+            self.buf.len() * 2
+        }
+    }
+
+    pub fn is_prefix_of(&self, node: &Node) -> bool {
+        if self.is_odd {
+            let buf = self.buf;
+            let last_pos = buf.len() - 1;
+            node.starts_with(buf.split_at(last_pos).0)
+                && node[last_pos] >> 4 == buf[last_pos] >> 4
+        } else {
+            node.starts_with(self.buf)
+        }
+    }
+
+    /// Retrieve the `i`th half-byte from the prefix.
+    ///
+    /// This is also the `i`th hexadecimal digit in numeric form,
+    /// also called a [nybble](https://en.wikipedia.org/wiki/Nibble).
+    pub fn get_nybble(&self, i: usize) -> u8 {
+        get_nybble(i, self.buf)
+    }
+}
+
+/// A shortcut for full `Node` references
+impl<'a> From<&'a Node> for NodePrefixRef<'a> {
+    fn from(node: &'a Node) -> Self {
+        NodePrefixRef {
+            buf: &*node,
+            is_odd: false,
+        }
+    }
+}
+
 #[cfg(test)]
 mod tests {
     use super::*;
@@ -88,4 +171,68 @@ 
             SAMPLE_NODE_HEX
         );
     }
+
+    #[test]
+    fn test_prefix_from_hex() -> Result<(), NodeError> {
+        assert_eq!(
+            NodePrefix::from_hex("0e1")?,
+            NodePrefix {
+                buf: vec![14, 16],
+                is_odd: true
+            }
+        );
+        assert_eq!(
+            NodePrefix::from_hex("0e1a")?,
+            NodePrefix {
+                buf: vec![14, 26],
+                is_odd: false
+            }
+        );
+
+        // checking limit case
+        assert_eq!(
+            NodePrefix::from_hex(SAMPLE_NODE_HEX)?,
+            NodePrefix {
+                buf: node_from_hex(SAMPLE_NODE_HEX)?.iter().cloned().collect(),
+                is_odd: false
+            }
+        );
+
+        Ok(())
+    }
+
+    #[test]
+    fn test_prefix_from_hex_errors() {
+        assert_eq!(
+            NodePrefix::from_hex("testgr"),
+            Err(NodeError::NotHexadecimal)
+        );
+        let long = "000000000000000000000000000000000000000000000";
+        match NodePrefix::from_hex(long)
+            .expect_err("should be refused as too long")
+        {
+            NodeError::PrefixTooLong(s) => assert_eq!(s, long),
+            err => panic!(format!("Should have been TooLong, got {:?}", err)),
+        }
+    }
+
+    #[test]
+    fn test_is_prefix_of() -> Result<(), NodeError> {
+        let mut node: Node = [0; 20];
+        node[0] = 0x12;
+        node[1] = 0xca;
+        assert!(NodePrefix::from_hex("12")?.borrow().is_prefix_of(&node));
+        assert!(!NodePrefix::from_hex("1a")?.borrow().is_prefix_of(&node));
+        assert!(NodePrefix::from_hex("12c")?.borrow().is_prefix_of(&node));
+        assert!(!NodePrefix::from_hex("12d")?.borrow().is_prefix_of(&node));
+        Ok(())
+    }
+
+    #[test]
+    fn test_get_nybble() -> Result<(), NodeError> {
+        let prefix = NodePrefix::from_hex("dead6789cafe")?;
+        assert_eq!(prefix.borrow().get_nybble(0), 13);
+        assert_eq!(prefix.borrow().get_nybble(7), 9);
+        Ok(())
+    }
 }
diff --git a/rust/hg-core/src/revlog.rs b/rust/hg-core/src/revlog.rs
--- a/rust/hg-core/src/revlog.rs
+++ b/rust/hg-core/src/revlog.rs
@@ -7,7 +7,7 @@ 
 
 pub mod node;
 pub mod nodemap;
-pub use node::{Node, NodeError};
+pub use node::{Node, NodeError, NodePrefix, NodePrefixRef};
 
 /// Mercurial revision numbers
 ///