Patchwork D6631: rust-cpython: add macro for sharing references

login
register
mail settings
Submitter phabricator
Date July 10, 2019, 11:48 a.m.
Message ID <differential-rev-PHID-DREV-hnckso2cffc2maw6phg4-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/40863/
State Superseded
Headers show

Comments

phabricator - July 10, 2019, 11:48 a.m.
Alphare created this revision.
Herald added subscribers: mercurial-devel, kevincox, durin42.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Following an experiment done by Georges Racinet, we now have a working way of
  sharing references between Python and Rust. This is needed in many points of
  the codebase, for example every time we need to expose an iterator to a
  Rust-backed Python class.
  
  In a few words, references are (unsafely) marked as `'static` and coupled
  with manual reference counting; we are doing manual borrow-checking.
  
  This changes introduces two declarative macro to help reduce boilerplate.
  While it is better than not using macros, they are not perfect. They need to:
  
  - Integrate with the garbage collector for container types (not needed
  
    	  as of yet), as stated in the docstring
  
  - Allow for leaking multiple attributes at the same time
  - Inject the `leaked_count` data attribute in `py_class`-generated
  
    	  structs
  
  - Automatically namespace the functions and attributes they generate
  
  For at least the last two points, we will need to write a procedural macro
  instead of a declarative one.
  While this reference-sharing mechanism is being ironed out I thought it best
  not to implement it yet.
  
  Lastly, and implementation detail renders our Rust-backed Python iterators too
  strict to be proper drop-in replacements, as will be illustrated in a future
  patch: if the data structure referenced by a non-depleted iterator is mutated,
  an `AlreadyBorrowed` exception is raised, whereas Python would allow it, only
  to raise a `RuntimeError` if `next` is called on said iterator. This will have
  to be addressed at some point.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6631

AFFECTED FILES
  rust/hg-cpython/src/dirstate/dirs_multiset.rs
  rust/hg-cpython/src/dirstate/mod.rs
  rust/hg-cpython/src/exceptions.rs
  rust/hg-cpython/src/lib.rs
  rust/hg-cpython/src/macros.rs

CHANGE DETAILS




To: Alphare, #hg-reviewers
Cc: durin42, kevincox, mercurial-devel
phabricator - July 16, 2019, 11:12 a.m.
kevincox added a comment.


  Is there any reason this can't be done using `Rc<RefCell<DirsMultiset>>`?  I have an example here: https://rust.godbolt.org/z/MNNR_F
  
  It uses `Rc` and `RefCell` however it does need to do some unsafe work to keep the RefCell borrowed for longer than would otherwise be easy to do. The advantages here are
  
  1. Lifetime is automatically managed by the Rc. This ensures that our parent data doesn't get deleted while the iterator is still alive.
  2. The linked implementation uses a mutable borrow. However it would be easy to make an immutable version which would allow multiple iterators over the same "container".
  3. Mutable access is managed by the RefCell so we don't need to do our own.

INLINE COMMENTS

> macros.rs:45
> +///     data inner: RefCell<MyStruct>;
> +///     data leak_count: RefCell<usize>;
> +/// });

You aren't really using the `RefCell` type here. It might be better to just use `Cell` for the count and `UnsafeCell` for the data.

> macros.rs:57
> +    ) => {
> +        impl $name {
> +            fn leak_immutable(&self, py: Python) -> &'static $inner_struct {

Instead of adding methods to the interface type I would just create a templated struct which has these methods as well as the data and count. It doesn't seem like these actually need to be in the parent type.

> macros.rs:61
> +                *self.leak_count(py).borrow_mut() += 1;
> +                unsafe { &*ptr }
> +            }

I'm failing to see what actually prevents the "container" from being dropped while the "iterator" is alive. Is this somehow tied to the Python GC?

For example:

  c = RustObject()
  i = iter(i)
  del c
  # What is keeping the backing rust memory alive at this point.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D6631/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D6631

To: Alphare, #hg-reviewers
Cc: durin42, kevincox, mercurial-devel
phabricator - July 16, 2019, 12:51 p.m.
gracinet added inline comments.

INLINE COMMENTS

> kevincox wrote in macros.rs:61
> I'm failing to see what actually prevents the "container" from being dropped while the "iterator" is alive. Is this somehow tied to the Python GC?
> 
> For example:
> 
>   c = RustObject()
>   i = iter(i)
>   del c
>   # What is keeping the backing rust memory alive at this point.

Hi Kevin,

all that CPython's `del` does is decrement the reference count, and if it drops to 0, then the memory will be freed immediately. On the other hand, the `clone_ref()` at line 106 does increment it.

So, in your example, after `del c` the refcount of `c` is 1 and the memory shouldn't be freed.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D6631/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D6631

To: Alphare, #hg-reviewers
Cc: gracinet, durin42, kevincox, mercurial-devel
phabricator - July 16, 2019, 1:13 p.m.
kevincox added a comment.


  In D6631#97216 <https://phab.mercurial-scm.org/D6631#97216>, @kevincox wrote:
  
  > Is there any reason this can't be done using `Rc<RefCell<DirsMultiset>>`?  I have an example here: https://rust.godbolt.org/z/MNNR_F
  
  I see now that since python is managing the reference it does make some sense to do it this way.

INLINE COMMENTS

> macros.rs:55
> +        $data_member: ident,
> +        $leaked: ident
> +    ) => {

The terminology is weird here. You don't appear to be tracking leaks, just borrows.

> gracinet wrote in macros.rs:61
> Hi Kevin,
> 
> all that CPython's `del` does is decrement the reference count, and if it drops to 0, then the memory will be freed immediately. On the other hand, the `clone_ref()` at line 106 does increment it.
> 
> So, in your example, after `del c` the refcount of `c` is 1 and the memory shouldn't be freed.

Thanks. I missed that.

> macros.rs:85
> +                         immutable references in Python objects",
> +                    )),
> +                }

This catches a mutable borrow after other borrows, however I believe it fails to catch a mutable borrow followed by immutable borrows.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D6631/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D6631

To: Alphare, #hg-reviewers
Cc: gracinet, durin42, kevincox, mercurial-devel
phabricator - July 24, 2019, 3:33 p.m.
Alphare added a comment.


  @kevincox I've renamed the file since it no longer just contains macros.
  
  I've moved whatever I could move to separate structs and used `Cell` instead of `RefCell` on those scalar types. I'm not so keen on using `UnsafeCell`, espacially since the target `py_class!`-resulting struct could be using `RefCell`-specific APIs.
  
  I'm not too sure on the terminology, so tell me if you have better ideas than mine.
  
  Later down the line, I could see a `trait` to help define the interface for shared structs.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D6631/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D6631

To: Alphare, #hg-reviewers
Cc: gracinet, durin42, kevincox, mercurial-devel
phabricator - Aug. 12, 2019, 4:11 p.m.
kevincox added a comment.
kevincox accepted this revision.


  In D6631#97887 <https://phab.mercurial-scm.org/D6631#97887>, @Alphare wrote:
  
  > @kevincox I've renamed the file since it no longer just contains macros.
  > I've moved whatever I could move to separate structs and used `Cell` instead of `RefCell` on those scalar types. I'm not so keen on using `UnsafeCell`, espacially since the target `py_class!`-resulting struct could be using `RefCell`-specific APIs.
  
  Ack. it still feels like we are kinda implementing a second `RefCell` anyways but I think this code is quite clean.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D6631/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D6631

To: Alphare, #hg-reviewers, kevincox
Cc: gracinet, durin42, kevincox, mercurial-devel
phabricator - Aug. 12, 2019, 4:54 p.m.
gracinet added a comment.


  > it still feels like we are kinda implementing a second `RefCell` anyways
  
  We all share that feeling, I think. Let's hope we'll be able to circumvent that in some future.
  
  > but I think this code is quite clean.
  
  Thanks, @Alphare is currently on leave. I'm sure he'll appreciate when he comes back.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D6631/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D6631

To: Alphare, #hg-reviewers, kevincox
Cc: gracinet, durin42, kevincox, mercurial-devel

Patch

diff --git a/rust/hg-cpython/src/macros.rs b/rust/hg-cpython/src/macros.rs
new file mode 100644
--- /dev/null
+++ b/rust/hg-cpython/src/macros.rs
@@ -0,0 +1,254 @@ 
+// macros.rs
+//
+// Copyright 2019 Raphaël Gomès <rgomes@octobus.net>
+//
+// This software may be used and distributed according to the terms of the
+// GNU General Public License version 2 or any later version.
+
+//! Macros for use in the `hg-cpython` bridge library.
+
+/// Allows a `py_class!` generated struct to share references to one of its
+/// data members with Python.
+///
+/// # Warning
+///
+/// The targeted `py_class!` needs to have the
+/// `data leak_count: RefCell<usize>;` data attribute to compile.
+/// A better, more complicated macro is needed to automatically insert the
+/// leak count, which is not yet really battle tested (what happens when
+/// multiple references are needed?). See the example below.
+///
+/// TODO allow Python container types: for now, integration with the garbage
+///     collector does not extend to Rust structs holding references to Python
+///     objects. Should the need surface, `__traverse__` and `__clear__` will
+///     need to be written as per the `rust-cpython` docs on GC integration.
+///
+/// # Parameters
+///
+/// * `$name` is the same identifier used in for `py_class!` macro call.
+/// * `$inner_struct` is the identifier of the underlying Rust struct
+/// * `$data_member` is the identifier of the data member of `$inner_struct`
+/// that will be shared.
+/// * `$leaked` is the identifier to give to the struct that will manage
+/// references to `$name`, to be used for example in other macros like
+/// `py_shared_mapping_iterator`.
+///
+/// # Example
+///
+/// ```
+/// struct MyStruct {
+///     inner: Vec<u32>;
+/// }
+///
+/// py_class!(pub class MyType |py| {
+///     data inner: RefCell<MyStruct>;
+///     data leak_count: RefCell<usize>;
+/// });
+///
+/// py_shared_ref!(MyType, MyStruct, inner, MyTypeLeakedRef);
+/// ```
+macro_rules! py_shared_ref {
+    (
+        $name: ident,
+        $inner_struct: ident,
+        $data_member: ident,
+        $leaked: ident
+    ) => {
+        impl $name {
+            fn leak_immutable(&self, py: Python) -> &'static $inner_struct {
+                let ptr = self.$data_member(py).as_ptr();
+                *self.leak_count(py).borrow_mut() += 1;
+                unsafe { &*ptr }
+            }
+
+            fn borrow_mut<'a>(
+                &'a self,
+                py: Python<'a>,
+            ) -> PyResult<RefMut<$inner_struct>> {
+                match *self.leak_count(py).borrow() {
+                    0 => Ok(self.$data_member(py).borrow_mut()),
+                    // TODO
+                    // For now, this works differently than Python references
+                    // in the case of iterators.
+                    // Python does not complain when the data an iterator
+                    // points to is modified if the iterator is never used
+                    // afterwards.
+                    // Here, we are stricter than this by refusing to give a
+                    // mutable reference if it is already borrowed.
+                    // While the additional safety might be argued for, it
+                    // breaks valid programming patterns in Python and we need
+                    // to fix this issue down the line.
+                    _ => Err(AlreadyBorrowed::new(
+                        py,
+                        "Cannot borrow mutably while there are \
+                         immutable references in Python objects",
+                    )),
+                }
+            }
+
+            fn decrease_leak_count(&self, py: Python) {
+                *self.leak_count(py).borrow_mut() -= 1;
+            }
+        }
+
+        /// Manage immutable references to `$name` leaked into Python
+        /// iterators.
+        ///
+        /// In truth, this does not represent leaked references themselves;
+        /// it is instead useful alongside them to manage them.
+        pub struct $leaked {
+            inner: $name,
+        }
+
+        impl $leaked {
+            fn new(py: Python, inner: &$name) -> Self {
+                Self {
+                    inner: inner.clone_ref(py),
+                }
+            }
+        }
+
+        impl Drop for $leaked {
+            fn drop(&mut self) {
+                let gil = Python::acquire_gil();
+                let py = gil.python();
+                self.inner.decrease_leak_count(py);
+            }
+        }
+    };
+}
+
+/// Defines a `py_class!` that acts as a Python iterator over a Rust iterator.
+macro_rules! py_shared_iterator_impl {
+    (
+        $name: ident,
+        $leaked: ident,
+        $iterator_type: ty,
+        $success_func: expr,
+        $success_type: ty
+    ) => {
+        py_class!(pub class $name |py| {
+            data inner: RefCell<Option<$leaked>>;
+            data it: RefCell<$iterator_type>;
+
+            def __next__(&self) -> PyResult<$success_type> {
+                let mut inner_opt = self.inner(py).borrow_mut();
+                if inner_opt.is_some() {
+                    match self.it(py).borrow_mut().next() {
+                        None => {
+                            // replace Some(inner) by None, drop $leaked
+                            inner_opt.take();
+                            Ok(None)
+                        }
+                        Some(res) => {
+                            $success_func(py, res)
+                        }
+                    }
+                } else {
+                    Ok(None)
+                }
+            }
+
+            def __iter__(&self) -> PyResult<Self> {
+                Ok(self.clone_ref(py))
+            }
+        });
+
+        impl $name {
+            pub fn from_inner(
+                py: Python,
+                leaked: Option<$leaked>,
+                it: $iterator_type
+            ) -> PyResult<Self> {
+                Self::create_instance(
+                    py,
+                    RefCell::new(leaked),
+                    RefCell::new(it)
+                )
+            }
+        }
+    };
+}
+
+/// Defines a `py_class!` that acts as a Python mapping iterator over a Rust
+/// iterator.
+///
+/// TODO: this is a bit awkward to use, and a better (more complicated)
+///     procedural macro would simplify the interface a lot.
+///
+/// # Parameters
+///
+/// * `$name` is the identifier to give to the resulting Rust struct.
+/// * `$leaked` corresponds to `$leaked` in the matching `py_shared_ref!` call.
+/// * `$iterator_type` is the iterator type
+/// (like `std::collections::hash_map::Iter`).
+/// * `$key_type` is the type of the key in the mapping
+/// * `$value_type` is the type of the value in the mapping
+/// * `$success_func` is a function for processing the Rust `(key, value)`
+/// tuple on iteration success, turning it into something Python understands.
+/// * `$success_func` is the return type of `$success_func`
+///
+/// # Example
+///
+/// ```
+/// struct MyStruct {
+///     inner: HashMap<Vec<u8>, Vec<u8>>;
+/// }
+///
+/// py_class!(pub class MyType |py| {
+///     data inner: RefCell<MyStruct>;
+///     data leak_count: RefCell<usize>;
+///
+///     def __iter__(&self) -> PyResult<MyTypeItemsIterator> {
+///         MyTypeItemsIterator::create_instance(
+///             py,
+///             RefCell::new(Some(MyTypeLeakedRef::new(py, &self))),
+///             RefCell::new(self.leak_immutable(py).iter()),
+///         )
+///     }
+/// });
+///
+/// impl MyType {
+///     fn translate_key_value(
+///         py: Python,
+///         res: (&Vec<u8>, &Vec<u8>),
+///     ) -> PyResult<Option<(PyBytes, PyBytes)>> {
+///         let (f, entry) = res;
+///         Ok(Some((
+///             PyBytes::new(py, f),
+///             PyBytes::new(py, entry),
+///         )))
+///     }
+/// }
+///
+/// py_shared_ref!(MyType, MyStruct, inner, MyTypeLeakedRef);
+///
+/// py_shared_mapping_iterator!(
+///     MyTypeItemsIterator,
+///     MyTypeLeakedRef,
+///     std::collections::hash_map::Iter,
+///     Vec<u8>,
+///     Vec<u8>,
+///     MyType::translate_key_value,
+///     Option<(PyBytes, PyBytes)>
+/// );
+/// ```
+macro_rules! py_shared_mapping_iterator {
+    (
+        $name:ident,
+        $leaked:ident,
+        $iterator_type: ident,
+        $key_type: ty,
+        $value_type: ty,
+        $success_func: path,
+        $success_type: ty
+    ) => {
+        py_shared_iterator_impl!(
+            $name,
+            $leaked,
+            $iterator_type<'static, $key_type, $value_type>,
+            $success_func,
+            $success_type
+        );
+    };
+}
\ No newline at end of file
diff --git a/rust/hg-cpython/src/lib.rs b/rust/hg-cpython/src/lib.rs
--- a/rust/hg-cpython/src/lib.rs
+++ b/rust/hg-cpython/src/lib.rs
@@ -27,6 +27,8 @@ 
 pub mod ancestors;
 mod cindex;
 mod conversion;
+#[macro_use]
+pub mod macros;
 pub mod dagops;
 pub mod dirstate;
 pub mod parsers;
diff --git a/rust/hg-cpython/src/exceptions.rs b/rust/hg-cpython/src/exceptions.rs
--- a/rust/hg-cpython/src/exceptions.rs
+++ b/rust/hg-cpython/src/exceptions.rs
@@ -65,3 +65,5 @@ 
         }
     }
 }
+
+py_exception!(shared_ref, AlreadyBorrowed, RuntimeError);
diff --git a/rust/hg-cpython/src/dirstate/mod.rs b/rust/hg-cpython/src/dirstate/mod.rs
--- a/rust/hg-cpython/src/dirstate/mod.rs
+++ b/rust/hg-cpython/src/dirstate/mod.rs
@@ -29,6 +29,7 @@ 
 mod dirs_multiset;
 use dirstate::dirs_multiset::Dirs;
 use std::convert::TryFrom;
+use exceptions::AlreadyBorrowed;
 
 /// C code uses a custom `dirstate_tuple` type, checks in multiple instances
 /// for this type, and raises a Python `Exception` if the check does not pass.
@@ -99,6 +100,7 @@ 
     m.add(py, "__doc__", "Dirstate - Rust implementation")?;
 
     m.add_class::<Dirs>(py)?;
+    m.add(py, "AlreadyBorrowed", py.get_type::<AlreadyBorrowed>())?;
 
     let sys = PyModule::import(py, "sys")?;
     let sys_modules: PyDict = sys.get(py, "modules")?.extract(py)?;
diff --git a/rust/hg-cpython/src/dirstate/dirs_multiset.rs b/rust/hg-cpython/src/dirstate/dirs_multiset.rs
--- a/rust/hg-cpython/src/dirstate/dirs_multiset.rs
+++ b/rust/hg-cpython/src/dirstate/dirs_multiset.rs
@@ -8,22 +8,25 @@ 
 //! Bindings for the `hg::dirstate::dirs_multiset` file provided by the
 //! `hg-core` package.
 
-use std::cell::RefCell;
+use std::cell::{RefCell, RefMut};
+use std::collections::hash_map::Iter;
+use std::convert::TryInto;
 
 use cpython::{
-    exc, ObjectProtocol, PyBytes, PyDict, PyErr, PyObject, PyResult,
-    ToPyObject,
+    exc, ObjectProtocol, PyBytes, PyClone, PyDict, PyErr, PyObject, PyResult,
+    Python,
 };
 
 use dirstate::extract_dirstate;
+use exceptions::AlreadyBorrowed;
 use hg::{
     DirsIterable, DirsMultiset, DirstateMapError, DirstateParseError,
     EntryState,
 };
-use std::convert::TryInto;
 
 py_class!(pub class Dirs |py| {
-    data dirs_map: RefCell<DirsMultiset>;
+    data inner: RefCell<DirsMultiset>;
+    data leak_count: RefCell<usize>;
 
     // `map` is either a `dict` or a flat iterator (usually a `set`, sometimes
     // a `list`)
@@ -59,18 +62,18 @@ 
             )
         };
 
-        Self::create_instance(py, RefCell::new(inner))
+        Self::create_instance(py, RefCell::new(inner), RefCell::new(0))
     }
 
     def addpath(&self, path: PyObject) -> PyResult<PyObject> {
-        self.dirs_map(py).borrow_mut().add_path(
+        self.borrow_mut(py)?.add_path(
             path.extract::<PyBytes>(py)?.data(py),
         );
         Ok(py.None())
     }
 
     def delpath(&self, path: PyObject) -> PyResult<PyObject> {
-        self.dirs_map(py).borrow_mut().delete_path(
+        self.borrow_mut(py)?.delete_path(
             path.extract::<PyBytes>(py)?.data(py),
         )
             .and(Ok(py.None()))
@@ -89,30 +92,43 @@ 
             })
     }
 
-    // This is really inefficient on top of being ugly, but it's an easy way
-    // of having it work to continue working on the rest of the module
-    // hopefully bypassing Python entirely pretty soon.
-    def __iter__(&self) -> PyResult<PyObject> {
-        let dict = PyDict::new(py);
-
-        for (key, value) in self.dirs_map(py).borrow().iter() {
-            dict.set_item(
-                py,
-                PyBytes::new(py, &key[..]),
-                value.to_py_object(py),
-            )?;
-        }
-
-        let locals = PyDict::new(py);
-        locals.set_item(py, "obj", dict)?;
-
-        py.eval("iter(obj)", None, Some(&locals))
+    def __iter__(&self) -> PyResult<DirsMultisetKeysIterator> {
+        DirsMultisetKeysIterator::create_instance(
+            py,
+            RefCell::new(Some(DirsMultisetLeakedRef::new(py, &self))),
+            RefCell::new(self.leak_immutable(py).iter()),
+        )
     }
 
     def __contains__(&self, item: PyObject) -> PyResult<bool> {
         Ok(self
-            .dirs_map(py)
+            .inner(py)
             .borrow()
             .contains_key(item.extract::<PyBytes>(py)?.data(py).as_ref()))
     }
 });
+
+py_shared_ref!(Dirs, DirsMultiset, inner, DirsMultisetLeakedRef);
+
+impl Dirs {
+    pub fn from_inner(py: Python, d: DirsMultiset) -> PyResult<Self> {
+        Self::create_instance(py, RefCell::new(d), RefCell::new(0))
+    }
+
+    fn translate_key(
+        py: Python,
+        res: (&Vec<u8>, &u32),
+    ) -> PyResult<Option<PyBytes>> {
+        Ok(Some(PyBytes::new(py, res.0)))
+    }
+}
+
+py_shared_mapping_iterator!(
+    DirsMultisetKeysIterator,
+    DirsMultisetLeakedRef,
+    Iter,
+    Vec<u8>,
+    u32,
+    Dirs::translate_key,
+    Option<PyBytes>
+);